You are here: Foswiki>Tasks Web>Item559 (09 Mar 2010, GeorgeClark)Edit Attach

Item559: GenPDF doesn't output all images

pencil
Priority: Normal
Current State: Closed
Released In:
Target Release:
Applies To: Extension
Component: GenPDFAddOn
Branches:
Reported By: TimotheLitt
Waiting For:
Last Change By: GeorgeClark
Update: Further work was required on GenPDF. I'm attaching a complete .zip file (also posted on the dark side.)

Limited test, and not ported to the Foswiki name space. But hope it's useful. I did include .patch files in case you have ported a copy.

T GenPDF doesn't output images - sometimes.

Turns out that it doesn't tolerate src and href with single quotes. Other things don't tolerate them with double.

Fix is in GenPDF.pm Also reported on TWiki.org.
--- GenPDF.pm.base      2008-10-31 15:53:43.000000000 -0400
+++ GenPDF.pm   2008-12-23 17:41:49.000000000 -0500
@@ -288,14 +288,14 @@
    # certificates.
    # Fully qualify any unqualified URLs (to make it portable to another host)
    my $url = TWiki::Func::getUrlHost();
    my $pdir = TWiki::Func::getPubDir();
    my $purlp = TWiki::Func::getPubUrlPath();

-   $text =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!sgi;
-   $text =~ s/<a(.*?) href="(?!#)\//<a$1 href="$url\//sgi;
+   $text =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!sgi;
+   $text =~ s/<a(.*?) href=(['"])(?!#)\//<a$1 href=$2$url\//sgi;

    # Save it to a file
    my ($fh, $name) = tempfile('GenPDFAddOnXXXXXXXXXX',
                                DIR => File::Spec->tmpdir(),
                               SUFFIX => '.html');
    open $fh, ">$name";
@@ -411,14 +411,14 @@
    # images.  Needed if wiki requires authentication like SSL client certifcates.
    # Fully qualify any unqualified URLs (to make it portable to another host)
    my $url = TWiki::Func::getUrlHost();
    my $pdir = TWiki::Func::getPubDir();
    my $purlp = TWiki::Func::getPubUrlPath();

-   $html =~ s!<img(.*?) src="($url)?$purlp!<img$1 src="$pdir\/!gi;
-   $html =~ s/<a(.*?) href="\//<a$1 href="$url\//gi;
+   $html =~ s!<img(.*?) src=(["'])($url)?$purlp!<img$1 src=$2$pdir\/!gi;
+   $html =~ s/<a(.*?) href=(['"])\//<a$1 href=$2$url\//gi;
    # link internally if we include the topic
    for my $wikiword (@$refTopics) {
       $url = TWiki::Func::getScriptUrl($webName, $wikiword, 'view');
       $html =~ s/([\'\"])$url/$1#$wikiword/g; # not anchored
       $html =~ s/$url(#\w*)/$1/g; # anchored
    }

-- TWiki:Main/TimotheLitt - 23 Dec 2008

The Regular Expressions used to extract / replace <img tags have been completely rewritten as part of Item892. This issue should be resolved. If it is still an issue, please provide a test case to reproduce the issue. Thanks

-- GeorgeClark - 15 Feb 2009

Foswiki:Main.IsaacLin pointed out that the href= case was missed in the changes. Reopening this until I can release the changes.

The following regex from Isaac was used to match the src= tag operand for images, and was also reworked a bit to match the href= operand. This will hopefully support single, double, or unquoted operands. And any ordering of operands on the <img tag is supported.

my $reSqString = qr{
  \'
  [^\']*
  \'
}x;

my $reDqString = qr{
  \"
  [^\"]*
  \"
}x;

my $reAttrValue = qr{
  (?: $reSqString | $reDqString | [^\'\"\s]+ )
}x;

## regex for matching <img tags

    my $reImgSrc = qr{
      <[iI][mM][gG]                                             # <img
        \s+                                                     # space
      (?: \w+ \s*=\s* $reAttrValue \s+ )*                       # 0 or more word = value
      [sS][rR][cC] \s*=\s*                                      # src=
      (?: (?:\'([^\']+)\') | (?:\"([^\"]+)\") | ([^\'\"\s]+) )  # delimited value
      (?: \s+ \w+ \s*=\s* $reAttrValue )*                       # 0 or more word = value
      \s*/?>                                                    # ending bracket
    }x;

## regex for qualifying <a tags

    $text =~ s{
          <a\s+                                   # starting img tag plus space
          ( (?: \w+ \s*=\s* $reAttrValue \s+ )* ) # 0 or more word = value - Assign to $1
          [hH][rR][eE][fF]\s*                     # href = with or without spaces
          (  =\s*[\"\']?                          # starts quote delimitied
          )/
         }{<a $1 href$2$url/}sgx;

Note that the .zip file also included code that implements TITLEDOC, FIRSTPAGE, DESTINATION, PAGELAYOUT and PAGEMODE and several other fixes besides the quoting problem that started the work.

-- TimotheLitt - 23 Feb 2009

ItemTemplate edit

Summary GenPDF doesn't output all images
ReportedBy TimotheLitt
Codebase
SVN Range TWiki-4.2.3, Wed, 06 Aug 2008, build 17396
AppliesTo Extension
Component GenPDFAddOn
Priority Normal
CurrentState Closed
WaitingFor
Checkins GenPDFAddOn:149481881f69 GenPDFAddOn:f94662ace478 GenPDFAddOn:888197d2f35a GenPDFAddOn:820848ecf90a
ReleasedIn
Topic attachments
I Attachment Action Size Date Who Comment
GenPDFAddOnUpdate.zipzip GenPDFAddOnUpdate.zip manage 37 K 24 Dec 2008 - 18:43 TimotheLitt Fixes for described problems; full kit with patch files (but not ported to foswiki)
Topic revision: r14 - 09 Mar 2010, GeorgeClark - This page was cached on 15 Jan 2020 - 18:37.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy