This question about Using an extension: Asked

abiword images

I am using abiword to convert word attachments into html because wvHtml is not able to handle MS Word 2007 docx files.

Unfortunately this plugin relies on the exact way wvHtml deal with the extraction of images from the original Word file i.e. is not compatible, at least for for what concernes images, with abiword.

After playing a bit around (helped by the developer of the plugin) now I know what should be done to fix the problem but I do not have time/skill to do it myself and import it into the official foswiki distro .. that's why I hope someone out there will do it ....

Note: the file used in all the example tests is the same doc file every time with few plain lines and one image in it.

Analysis

This is the form abiword generates & references the images out of the box "from console"==>

~/tests/abiword # /usr/bin/abiword --to=html Test.doc
Filesystem ==>
~/tests/abiword/Test.html
~/tests/abiword/Test.html_files/0.png

URL reference ==>

<p dir="ltr" style="text-align:left;margin-bottom:10pt"><img style="width:63.5mm" alt="" src="Test.html_files/0.png" /></p>

This is the form abiwordgenerates & references the images out of the box "though foswiki" (with the default {MsOfficeAttachmentsAsHTMLPlugin}{filters})==>

Filesystem ==>

/var/lib/foswiki/pub/Sandbox/TestTest6/Test.html /var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files/0.png
/var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files/image0.png

0.png identical to image0.png
URL reference ==>

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="Test.html_files/image0.png" /></p>

==> 0.png is not used at all .. where does image0.png comes from?

This is the way foswiki likes it: ==>

Filesystem ==>

/var/lib/foswiki/pub/Sandbox/TestTest6/Test.html /var/lib/foswiki/pub/Sandbox/TestTest6/image0.png

URL reference ==>

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="http://hostname.domain.de/foswiki/pub/Sandbox/TestTest6/image0.png " /></p>


This is the form wvHtml generates & references the images out of the box "from console"==>

~/tests/wvHtml# /usr/bin/wvHtml Test.doc Test.html
Filesystem ==>
~/tests/wvHtml/Test.html
~/tests/wvHtml/Test0.jpg
URL reference ==>

<img width="250" height="163" alt="0x01 graphic" src="Test0.jpg"><br>



This is the form wvHtml generates & references the images out of the box "though foswiki" (with the default {MsOfficeAttachmentsAsHTMLPlugin}{filters})==>

Filesystem ==>
/var/lib/foswiki/pub/Sandbox/TestTest8/Test.html
/var/lib/foswiki/pub/Sandbox/TestTest8/Test0.jpg
URL reference ==>


<img width="250" height="163" alt="0x01 graphic" src=" http://host.domain.de/foswiki/pub/Sandbox/TestTest8/Test0.jpg">

Solution "Theory"

This means

1) I do not understand where the transformation between 0.png => image0.png happens ???? (see difference between the command line abiword and the foswiki abiword)
2) the default {MsOfficeAttachmentsAsHTMLPlugin}{filters}

[ 
's#(<img[^>]*\\bsrc=)(["\'])([^/]+?)\\2#$1$2%ATTACHURL%/$3$2#sgi' 
] 

obviously does not find a match in the abiword case, since it leaves the img string unmodified ==> the regex must be changed in a proper way

such that

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="Test.html_files/image0.png" /></p> 

==>

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="http://hostname.domain.de/foswiki/pub/Sandbox/TestTest6/image0.png" /></p> 


3) even if I would be able to change the regex in the proper way I will still need, at the filesystem level, to:
a) cp /var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files/* /var/lib/foswiki/pub/Sandbox/TestTest6/
b) rm -r /var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files (to clean up)

obviously this all should be written in terms of FOSWIKI variables

4) actually this is not enough because since there could be more attaches in a topic and every attach is transformed in html I should even change the names of the images otherwise with the "copy step" I will overwrite some of them (I will loose the img attach belonging)!

i.e. actually should be

1')regex:

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="Test.html_files/image0.png" /></p> 

==>

<p style="text-align:left"><img style="width:63.5mm" title="" alt="" src="http://hostname.domain.de/foswiki/pub/Sandbox/TestTest6/Test0.png" /></p> 



2') file system

a)mv /var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files/image0.png /var/lib/foswiki/pub/Sandbox/TestTest6/Test0.png
c) rm -r /var/lib/foswiki/pub/Sandbox/TestTest6/Test.html_files
obviously this all should be written in terms of FOSWIKI variables

Solution "Praxis"

  1. I do not understand where the transformation between 0.png ==> image0.png happens ???? (see difference between the command line abiword and the foswiki abiword)
  2. the regex in {MsOfficeAttachmentsAsHTMLPlugin}{filters} should be fixed
  3. The perl code part:
    1. rewriting "the concept" into foswiki variables
    2. keep wvHtml compatibility with a case flag (case wvHtml , case abiword)
Obviously if there would be a way to convince abiword to be more "wvHtml" similar would be easier .. but I did not find a way to .....

Anyone wants to play with it?

-- CatiaLavalle - 27 Aug 2010

QuestionForm edit

Subject Using an extension
Extension MsOfficeAttachmentsAsHTMLPlugin
Version Foswiki 1.0.0
Status Asked
Topic revision: r2 - 07 Sep 2010, CatiaLavalle
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy