Feature Proposal: Delegate More Processing To Search Algorithm
Motivation
During my development of the Kino Search Algorithm in
KinoSearchContrib, it becomes incredibly obvious that the Foswiki core needs to delegate more choices to the Search Algorithm.
This work may be interwoven with some of the
ResultSets and
ExtractAndCentralizeFormattingRefactor work.
Description and Documentation
In TWiki 4.2.2, when SEARCHs happen, we call a very naive pluggable function
once per web -
SearchAlgorithm::search ( $searchString, $topics, $options, $sDir, $sandbox, $web )
where $options only contains scope, type, casesensitive, wordboundaries, and $topics (painfully) created list of topics.
This function then returns a hash of topic name to 'extract', which the Search rendering then throws away, keeping only the topicname list.
KinoSearchContrib (As can the Xapian Engine I'm working on) can return (incredibly quickly) all the meta information for the topic, including a contextual extract, and to add to that, can return non-topics - attachments and other external data, which I would love to use.
Impact
Implementation
So: I propose to refactor the
TWiki::Store::SearchAlgorithms
and
TWiki::Store::QueryAlgorithms
API's (which I understand only Crawford and I have worked with
please pipe up if I've missed you to :
- bring them into one API, where multiple SearchAlgorithms can register themselves as capable of processing a search type (or list of types)
- create the UI elements to dynamically add support for enabled 'types' in the WebSearch topic (so we can have attachment, external doc, google search) checkboxes
- pass the SearchAlgorithms all the known settings that might allow it to optimise a query (including the format string)
- use any information that SearchAlgorithms return in the output rendering, thus leveraging advanced improvements
for backwards compatibility, the currently existing search types and scopes will be required to return identical results as in previous versions of twiki. This implies that
scope=all
will not in fact search all data types, but rather only topicname and topic text.
--
Contributors: SvenDowideit - 19 Aug 2008
Discussion
Great Initiative, Sven!!!
From my studies about twiki performance, I realized that search and store are the worst bottlenecks. I was
planning to try out Xapian (it seems to be very fast).
TWiki-5 will fly
--
GilmarSantosJr - 19 Aug 2008
Sounds excellent, Sven. The devil is in the detail; it sounds like you will be doing a lot of refactoring in Search.pm (to get rid of those topic lists, for a start).
Ideally I'd like the API fixes to climb higher up the tree so that I can perform multiple-web searches with one call; though that may be a refactoring too far.
--
CrawfordCurrie - 19 Aug 2008
It would be so cool to make it a modern interface using iterators over result sets. I can imagine that most of the current Search.pm simply goes over the fence.
--
MichaelDaum - 19 Aug 2008
Please remember a date in date of commitment field so the proposal app can work. Added todays date
--
KennethLavrsen - 11 Sep 2008
The options are now passed on to the Search Algorithm, which can ignore them as it needs - The
MongoDBContrib work validated parts of this, and when foswiki 1.1 is released I'll continue work on that.
--
SvenDowideit - 14 May 2010
Any documentation for the new API?
What is the
MongoDBConbtruib all about?
--
JulianLevens - 14 Jun 2010
docco - sorry, like the store api on the whole, the source is still moving - I should really write something asap.
MongoDBContrib is the latest in my attempts to provide a modern backend for foswiki - its in svn - but i think its broken right now - you should see some commits to update it to the current state of 1.1 very soon.

sorry, i've dropped working on 'future' things to focus on getting 1.1 out.
--
SvenDowideit - 16 Jun 2010