Item559: GenPDF doesn't output all images
Priority: Normal
Current State: Closed
Released In:
Target Release:
Update: Further work was required on
GenPDF. I'm attaching a complete .zip file (also posted on the dark side.)
Limited test, and not ported to the Foswiki name space. But hope it's useful. I did include .patch files in case you have ported a copy.
T
GenPDF doesn't output images - sometimes.
Turns out that it doesn't tolerate src and href with single quotes. Other things don't tolerate them with double.
Fix is in
GenPDF.pm Also reported on TWiki.org.
--- GenPDF.pm.base 2008-10-31 15:53:43.000000000 -0400
+++ GenPDF.pm 2008-12-23 17:41:49.000000000 -0500
@@ -288,14 +288,14 @@
# certificates.
# Fully qualify any unqualified URLs (to make it portable to another host)
my $url = TWiki::Func::getUrlHost();
my $pdir = TWiki::Func::getPubDir();
my $purlp = TWiki::Func::getPubUrlPath();
- $text =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!sgi;
- $text =~ s/<a(.*?) href="(?!#)\//<a$1 href="$url\//sgi;
+ $text =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!sgi;
+ $text =~ s/<a(.*?) href=(['"])(?!#)\//<a$1 href=$2$url\//sgi;
# Save it to a file
my ($fh, $name) = tempfile('GenPDFAddOnXXXXXXXXXX',
DIR => File::Spec->tmpdir(),
SUFFIX => '.html');
open $fh, ">$name";
@@ -411,14 +411,14 @@
# images. Needed if wiki requires authentication like SSL client certifcates.
# Fully qualify any unqualified URLs (to make it portable to another host)
my $url = TWiki::Func::getUrlHost();
my $pdir = TWiki::Func::getPubDir();
my $purlp = TWiki::Func::getPubUrlPath();
- $html =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!gi;
- $html =~ s/<a(.*?) href="\//<a$1 href="$url\//gi;
+ $html =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!gi;
+ $html =~ s/<a(.*?) href=(['"])\//<a$1 href=$2$url\//gi;
# link internally if we include the topic
for my $wikiword (@$refTopics) {
$url = TWiki::Func::getScriptUrl($webName, $wikiword, 'view');
$html =~ s/([\'\"])$url/$1#$wikiword/g; # not anchored
$html =~ s/$url(#\w*)/$1/g; # anchored
}
--
TWiki:Main/TimotheLitt - 23 Dec 2008
The Regular Expressions used to extract / replace <img tags have been completely rewritten as part of
Item892. This issue should be resolved. If it is still an issue, please provide a test case to reproduce the issue. Thanks
--
GeorgeClark - 15 Feb 2009
Foswiki:Main.IsaacLin pointed out that the href= case was missed in the changes. Reopening this until I can release the changes.
The following regex from Isaac was used to match the src= tag operand for images, and was also reworked a bit to match the href= operand. This will hopefully support single, double, or unquoted operands. And any ordering of operands on the <img tag is supported.
my $reSqString = qr{
\'
[^\']*
\'
}x;
my $reDqString = qr{
\"
[^\"]*
\"
}x;
my $reAttrValue = qr{
(?: $reSqString | $reDqString | [^\'\"\s]+ )
}x;
## regex for matching <img tags
my $reImgSrc = qr{
<[iI][mM][gG] # <img
\s+ # space
(?: \w+ \s*=\s* $reAttrValue \s+ )* # 0 or more word = value
[sS][rR][cC] \s*=\s* # src=
(?: (?:\'([^\']+)\') | (?:\"([^\"]+)\") | ([^\'\"\s]+) ) # delimited value
(?: \s+ \w+ \s*=\s* $reAttrValue )* # 0 or more word = value
\s*/?> # ending bracket
}x;
## regex for qualifying <a tags
$text =~ s{
<a\s+ # starting img tag plus space
( (?: \w+ \s*=\s* $reAttrValue \s+ )* ) # 0 or more word = value - Assign to $1
[hH][rR][eE][fF]\s* # href = with or without spaces
( =\s*[\"\']? # starts quote delimitied
)/
}{<a $1 href$2$url/}sgx;
Note that the .zip file also included code that implements TITLEDOC, FIRSTPAGE, DESTINATION, PAGELAYOUT and PAGEMODE and several other fixes besides the quoting problem that started the work.
--
TimotheLitt - 23 Feb 2009