You are here: Foswiki>Tasks Web>Item559 (09 Mar 2010, GeorgeClark)Edit Attach

Item559: GenPDF doesn't output all images

pencil
Priority: Normal
Current State: Closed
Released In:
Target Release:
Applies To: Extension
Component: GenPDFAddOn
Branches:
Reported By: TimotheLitt
Waiting For:
Last Change By: GeorgeClark
Update: Further work was required on GenPDF. I'm attaching a complete .zip file (also posted on the dark side.)

Limited test, and not ported to the Foswiki name space. But hope it's useful. I did include .patch files in case you have ported a copy.

T GenPDF doesn't output images - sometimes.

Turns out that it doesn't tolerate src and href with single quotes. Other things don't tolerate them with double.

Fix is in GenPDF.pm Also reported on TWiki.org.
--- GenPDF.pm.base      2008-10-31 15:53:43.000000000 -0400
+++ GenPDF.pm   2008-12-23 17:41:49.000000000 -0500
@@ -288,14 +288,14 @@
    # certificates.
    # Fully qualify any unqualified URLs (to make it portable to another host)
    my $url = TWiki::Func::getUrlHost();
    my $pdir = TWiki::Func::getPubDir();
    my $purlp = TWiki::Func::getPubUrlPath();

-   $text =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!sgi;
-   $text =~ s/<a(.*?) href="(?!#)\//<a$1 href="$url\//sgi;
+   $text =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!sgi;
+   $text =~ s/<a(.*?) href=(['"])(?!#)\//<a$1 href=$2$url\//sgi;

    # Save it to a file
    my ($fh, $name) = tempfile('GenPDFAddOnXXXXXXXXXX',
                                DIR => File::Spec->tmpdir(),
                               SUFFIX => '.html');
    open $fh, ">$name";
@@ -411,14 +411,14 @@
    # images.  Needed if wiki requires authentication like SSL client certifcates.
    # Fully qualify any unqualified URLs (to make it portable to another host)
    my $url = TWiki::Func::getUrlHost();
    my $pdir = TWiki::Func::getPubDir();
    my $purlp = TWiki::Func::getPubUrlPath();

-   $html =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!gi;
-   $html =~ s/<a(.*?) href="\//<a$1 href="$url\//gi;
+   $html =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!gi;
+   $html =~ s/<a(.*?) href=(['"])\//<a$1 href=$2$url\//gi;
    # link internally if we include the topic
    for my $wikiword (@$refTopics) {
       $url = TWiki::Func::getScriptUrl($webName, $wikiword, 'view');
       $html =~ s/([\'\"])$url/$1#$wikiword/g; # not anchored
       $html =~ s/$url(#\w*)/$1/g; # anchored
    }

-- TWiki:Main/TimotheLitt - 23 Dec 2008

The Regular Expressions used to extract / replace <img tags have been completely rewritten as part of Item892. This issue should be resolved. If it is still an issue, please provide a test case to reproduce the issue. Thanks

-- GeorgeClark - 15 Feb 2009

Foswiki:Main.IsaacLin pointed out that the href= case was missed in the changes. Reopening this until I can release the changes.

The following regex from Isaac was used to match the src= tag operand for images, and was also reworked a bit to match the href= operand. This will hopefully support single, double, or unquoted operands. And any ordering of operands on the <img tag is supported.

my $reSqString = qr{
  \'
  [^\']*
  \'
}x;

my $reDqString = qr{
  \"
  [^\"]*
  \"
}x;

my $reAttrValue = qr{
  (?: $reSqString | $reDqString | [^\'\"\s]+ )
}x;

## regex for matching <img tags

    my $reImgSrc = qr{
      <[iI][mM][gG]                                             # <img
        \s+                                                     # space
      (?: \w+ \s*=\s* $reAttrValue \s+ )*                       # 0 or more word = value
      [sS][rR][cC] \s*=\s*                                      # src=
      (?: (?:\'([^\']+)\') | (?:\"([^\"]+)\") | ([^\'\"\s]+) )  # delimited value
      (?: \s+ \w+ \s*=\s* $reAttrValue )*                       # 0 or more word = value
      \s*/?>                                                    # ending bracket
    }x;

## regex for qualifying <a tags

    $text =~ s{
          <a\s+                                   # starting img tag plus space
          ( (?: \w+ \s*=\s* $reAttrValue \s+ )* ) # 0 or more word = value - Assign to $1
          [hH][rR][eE][fF]\s*                     # href = with or without spaces
          (  =\s*[\"\']?                          # starts quote delimitied
          )/
         }{<a $1 href$2$url/}sgx;

Note that the .zip file also included code that implements TITLEDOC, FIRSTPAGE, DESTINATION, PAGELAYOUT and PAGEMODE and several other fixes besides the quoting problem that started the work.

-- TimotheLitt - 23 Feb 2009

ItemTemplate edit

Summary GenPDF doesn't output all images
ReportedBy TimotheLitt
Codebase
SVN Range TWiki-4.2.3, Wed, 06 Aug 2008, build 17396
AppliesTo Extension
Component GenPDFAddOn
Priority Normal
CurrentState Closed
WaitingFor
Checkins GenPDFAddOn:149481881f69 GenPDFAddOn:f94662ace478 GenPDFAddOn:888197d2f35a GenPDFAddOn:820848ecf90a
ReleasedIn
I Attachment Action Size Date Who Comment
GenPDFAddOnUpdate.zipzip GenPDFAddOnUpdate.zip manage 37 K 24 Dec 2008 - 18:43 TimotheLitt Fixes for described problems; full kit with patch files (but not ported to foswiki)
Topic revision: r14 - 09 Mar 2010, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy