This question about : Answered

Solr search doesn't show contents if the attchment is .doc or .pdf

Not quite sure should I ask Foswiki or Solr search engine?

We use solr search engine with Foswiki. In the search, if the attachment is .doc or .pdf, the contents is showed as non-recognised like below, but if the attachme is .txt or .xlsx, it's fine. Any advice is appreciated.

Mysearch.doc in Mysearch

?? ?? ? ?? ? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????¿? ????? ????? ? ? ? ·? ?? ? ? ? ?? ??? ? ? ????? ????? ? ? ? ????????? ? ? ? ? ???????? ? ? ? ? ? ? ? ? ? ? ????? ? ? ??? ? ? ? ? ?

P.S. We don't use natskin plugin.

-- YangShen - 30 Nov 2014

This happens when the helpers that you configured in StringifierContrib are unable to read the files correctly. Best results are achieved using

$Foswiki::cfg{StringifierContrib}{WordIndexer} = 'soffice';
$Foswiki::cfg{StringifierContrib}{PowerpointIndexer} = 'soffice';

or

$Foswiki::cfg{StringifierContrib}{WordIndexer} = 'wv';

Password-protected office files may fail as well. These are only readable with a user interaction, i.e. not full-text-indexable.

One you changed these settings go to <foswiki-dir>/working/work_areas/SolrPlugin/ and delete all subdirectiries in there. These are the cached stringified versions of your office documents. Once you've deleted them your next indexing run will extract the office docs using the newer stringifier settings.

There is a testing tool available as well: see <foswiki-dir>/tools/stringify <file-name> that you can use to fire up stringifier on the commandline.

-- MichaelDaum - 01 Dec 2014

'wv' was not working when I tried. 'antiword' looks the correct one. All others are good. Thanks a lot.

-- YangShen - 14 Jan 2015
 

QuestionForm edit

Subject
Extension SolrPlugin, StringifierContrib
Version Foswiki 1.1.9
Status Answered
Related Topics
Topic revision: r3 - 14 Jan 2015, YangShen - This page was cached on 17 Nov 2018 - 22:52.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy