You are here: Foswiki>Tasks Web>Item8629 (25 Mar 2011, AndrewJones)Edit Attach

Item8629: refactor KinoSearchContrib to use StringiferContrib

pencil
Priority: Urgent
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: KinoSearchContrib
Branches:
Reported By: WillNorris
Waiting For: Main.AndrewJones
Last Change By: AndrewJones
or, consider deprecating KinoSearchContrib when the SolrPlugin is released. we should be able to keep KinoSearchPlugin (but plugged into the Solr backend) so that existing topic contents continue to work. if we're really ambitious, we can ship kinosearch script shims so that KinoSearch upgraders wouldn't have to even change their system scripts.

-- WillNorris - 26 Feb 2010

I would much rather keep the variety. There may be other reasons why Solr isn't suitable for some users.

-- SvenDowideit - 27 Feb 2010

but plugged into the Solr backend ... does not compute.

Basically, kino as well as good old plucene are embedded frontends to lucene. Solr, on the other hand is a "serverization" of lucene ... and a lot more. Feature-wise, it is a true superset of kino or plucene.

Main disadvantage, lots of perl dependencies.

StringifierContrib is meant to be a share based between both as they both need serialization of binary data. This is comparable with Tika which I tried as well. It does have some advantages over StringifierContrib, i.e. also able to return meta data in an xml format, like exif data of a photo. However, I abandoned Tika as basic tests on various office files showed that StringifierContrib had a higher coverage while being more robust against weird office files.

StringifierContrib has been externalized and improved over time. For one, it does not crash on password encoded xls files etc.

Coming soon: caching - don't re-stringify the same binary file if it has changed in the meantime. That's part of SolrPlugin for no good reason.

There are still some more fundamental issues with StringifierContrib inherited from its origins, foremost it recodes all strings to iso-8859-15. This might very well clash with the rest of the site's encoding. Next, its per line way of trying to detect a documents encoding might be overkill. So there's room for speed improvements here.

Other than that, it is a solid piece of work.

-- MichaelDaum - 27 Feb 2010

Done.

-- AndrewJones - 06 Jun 2010

There may be something wrong with this work?

* AndreU has quit (Quit: AndreU)
<alice|wl_> any help with kinosearch here?
<alice|wl_> accoring to ks_test I miss a Foswiki/Contrib/KinoSearchContrib/StringifyBase.pm
<alice|wl_> it is not in the package

-- SvenDowideit - 12 Jul 2010

I have removed the ks_test script from the contrib, as the functionality is covered with the stringify script in StringifierContrib.

-- AndrewJones - 14 Jul 2010

Forgot to remove config items.

-- AndrewJones - 25 Mar 2011

 
Topic revision: r11 - 25 Mar 2011, AndrewJones - This page was cached on 19 Nov 2017 - 06:34.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License