MsOfficeAttachmentsAsHTMLPlugin

This is the support hub for MsOfficeAttachmentsAsHTMLPlugin.

If you have a question, please check if you can find it already answered below.

Posted but not answered questions

Ask a new question

Can't find your problem here, or would you like to post a request? Ask a question on MsOfficeAttachmentsAsHTMLPlugin.

Active Tasks

Id Summary Priority Current State
Item11587 Calls to deprecated readTopicText and saveTopicText are in several public extensions Normal New
Item9172 After topic content is converted from .doc, topic parent changes to none Low New

MS Word 2007 .docx Support

Tested on debian lenny (5.0.5) & wv 1.2.4-2 & abiword 2.6.4-5

The default word --> HTML converter (wvHtml) looks not compatible with MS Word 2007 docx files.

By executing the basis conversion from console (/usr/bin/wvHtml test.docx test.html) the created html file is empty (0 bits).

With MsOfficeAttachmentsAsHTMLPlugin enabled no docx file can be attached at all as soon as the plugin is enabled (error message, not able to save topic).

To solve it: /usr/bin/abiword --to=html %SRC|F% %ATTACHDIR|F%/%DEST|F%

Both doc & docx attachment now work (the formatting of the doc files is almost 100% correct, the formatting of the docx files is at 95% correct).

abiword & special characters

abiword generates html Files with encoding="UTF-8". This means as soon as your original doc/docx attachment contain special characters (umlauts etc etc ) those characters will not be shown correctly in foswiki thought the converted html file unless you set:

{Site}{CharSet} = 'utf-8';

in configure under localization.

abiword & images ==> OPEN PROBLEM

The images are embedded in a word file. When a Word file is transformed into an html file the images are extracted and referenced via an URL link in the html file. The way the image files are extracted (the name they get, the "path" in the file system they go to, the way they are referenced i the html file) are different in every converter program.

In particoular:
  • wvHtml extracts in this way:
/usr/bin/wvHtml Test.doc Test.html
Filesystem ==>
~/tests/wvHtml/Test.html
~/tests/wvHtml/Test0.jpg

URL reference ==>
<img width="250" height="163" alt="0x01 graphic" src="Test0.jpg"><br>

  • abiword extracts in this way:
~/tests/abiword # /usr/bin/abiword --to=html Test.doc
Filesystem ==>
~/tests/abiword/Test.html
~/tests/abiword/Test.html_files/0.png

URL reference ==>
<p dir="ltr" style="text-align:left;margin-bottom:10pt"><img style="width:63.5mm" alt="" src="Test.html_files/0.png" /></p>

This unfortunately leads to an "out-of-the-box-incompatibility" with foswiki.
  1. The {Plugins}{MsOfficeAttachmentsAsHTMLPlugin}{filters} substitution string (which is used to transform the URL of the image into something foswiki likes : http://hostname.domain/foswiki/pub/<Web>/<Topic>/image ) does not work like it is. So no substutution is done i.e. it does not work.
  2. The "subdirectory structure" the images go into after the extraction with abiword is not "foswiki" compatible (even if I point to http://hostname.domain/foswiki/pub/<Web>/<Topic>/<Attach>.html_files/image the image is not "loaded"/"found"). So the plugin should at the file system level move the images from /foswiki/pub/<Web>/<Topic>/<Attach>.html_files/image to /foswiki/pub/<Web>/<Topic>/image.
  3. This means since there could be more attach in one topic the name of the image files must be changed in the copy into the pub directory of the topic such that the attach belonging is clear and no one of them will be overwritten.
  4. Step 2. and 3. imply a change in the Perl code of the plugin i.e. in order not to break to compatibility with wvHtml a "case" option should be coded (case wvHtml .... , case abiword .....)
Topic revision: r3 - 27 Aug 2010, CatiaLavalle
 
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see CopyrightStatement. Creative Commons License