StringifierContrib Development

This is the topic to discuss development of download StringifierContrib

help If you need support, go to Support.StringifierContrib where you can ask questions and find answers to previously asked questions. warning If you want to report a bug, or a feature request, go to Tasks.StringifierContrib where you can see already submitted issues and where you can submit a new bug report or feature request.

Active Items

Discussion

I use Apache Tika to stringify - it's one actively maintained package that converts many file formats.

Currently it's on a TWiki site, not a FOSwiki site, but I wanted to let folks know it's an option,
and I'm happy to provide more detail if there is interest.

-- ClifKussmaul - 21 Dec 2011

Clif, yes please - it makes sense for us to support different options to cater for setups smile

-- SvenDowideit - 22 Dec 2011

Steps:
  1. Download tika.jar from tika.apache.org, and save it (e.g. /usr/local/tika).
  2. Download tika2txt (attached below), adjust the paths & Java options, save it (e.g. /usr/local/bin), and test.
  3. Download Tika.pm (attached below), adjust the paths and file types handled, and save it with other Stringifiers.
  4. Disable other stringifiers to avoid conflicts, and test.
-- ClifKussmaul - 22 Dec 2011

I've been evaluating tika in depth and found out that it has a lower coverage of various office formats compared to StringifierContrib.

-- MichaelDaum - 23 Dec 2011

it might have lower coverage, but its useful to have alternatives that might have either different results, or might suit the admin better smile

-- SvenDowideit - 23 Dec 2011

On a client site with pdf, doc/docx, ppt/pptx, xls/xlsx files, results for the largest files of each type seemed similar, and sometimes favored tika. We had some problems with PPTX in tika 0.9 that were fixed in tika 1.0. YMMV, of course.

-- ClifKussmaul - 23 Dec 2011
 
Topic attachments
I Attachment Action Size Date Who Comment
Tika.pmpm Tika.pm manage 2 K 22 Dec 2011 - 18:09 ClifKussmaul Tika stringifier, which uses tika2txt
tika2txt.txttxt tika2txt.txt manage 429 bytes 22 Dec 2011 - 18:04 ClifKussmaul simple script that uses Tika to stringify files
Topic revision: r7 - 23 Dec 2011, ClifKussmaul - This page was cached on 05 Aug 2020 - 15:09.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy