Feature Proposal: Next steps in improving the data store abstraction and search support

Motivation

Searching is the major bottleneck for any large wiki application. Many times we've said "WIBNIF we could use SQL", the idea being that you could then search over a database store (or a database cache of the store) using SQL queries.

Description and Documentation

Impact

Core changes are required to introduce the concept of a "Store cache". This can be done entirely within VC.

Need to add cache update triggers in the core.

%WHATDOESITAFFECT%
edit

Implementation

I have implemented an algorithm that analyses Foswiki %SEARCH statements and hoists SQL statements out of them. It then runs the queries over the database, narrowing down the set of topics to which the un-hoistable parts of the query have to be applied. This is very similar in approach to the existing BruteForce query algorithm (unsurprising, since I wrote that as well).

The implementation - call it a "reference implementation" - uses sqlite, which does not scale well to large data, but the performance is pretty good on relatively small data. Moving to another SQL DB should be fairly trivial - it's just a case of mapping SQL syntax differences.

The schema used in the DB is extracted directly from Foswiki::Meta::VALIDATE, with the addition of a field tid to every generated table, and the following table:
CREATE TABLE 'topic' (
    'tid' TEXT,
    'web' TEXT,
    'name' TEXT,
    'text' TEXT,
    UNIQUE (tid)
);

Note that the extraction is done before any plugin or extension has a chance to call registerMETA, and the tables currently do not auto-extend (scope for further work there).

-- Contributors: CrawfordCurrie - 15 Sep 2010

Discussion

Out of Off topic. Why does rev 1 of this topic only have 4 fields visible?

-- KennethLavrsen - 15 Sep 2010

Looks like empty fields are omitted.

-- ArthurClemens - 15 Sep 2010

I look forward to seeing what you've done - I managed something similar in the MongoDBPlugin - but using the core code as it is in 1.1 - I'm curious what more improvements you've made, and how I can leverage them smile

wrt cache triggers - I'm not sure you need to - foswiki already has them in spades - see MongoDBPlugin - the Plugins API....

mmm, mind you, I suspect it'd be better to have the cache triggers in Store / Meta - as VC can be bypassed already? (i'm reaching into deep memory atm the girls are screaming)

-- SvenDowideit - 16 Sep 2010

Yes, I read, the MongoDBPlugin code before starting, thanks ;-). I also started out trying to use the plugin triggers - specifically the afterSaveHandler - but stopped because the afterSaveHandler is only called from save, not from saveAs - this is a legacy problem - so not all save paths are covered by the handler. It may be possible to connect to the cache via the afterSaveHandler, but I haven't tried it (yet; that's my strategy for providing this on Foswiki 1.1 and older). And yes, I wanted the hooks as close to the store as possible. There's really no need for the core to know anything about this cache, in any way; it should be entirely behind the Store interface.

Note that I decided not to attempt to implement any part of the store except the search. My thinking was KISS. I've also been using sqlite, which I acknowledge is nowhere near as efficient as other DB impls, but has the advantage of being simple(r) and easy to use at this early stage.

One thing I'm acutely conscious of is that the update step I currently have is monstrous. Basically I'm doing a "remove" of the existing object and "insert" of the new object. When the DB has indexes (there are none there at the moment) this could come to be a pain.

-- CrawfordCurrie - 16 Sep 2010

A number of pre-store2 prep steps, and the already-deprecated deep store listeners were implemented under this proposal. I'm switching it to UnderConstruction and slotting it into 1.2. Please open a "Phase 2" proposal if another one is needed, and doesn't already exist.

-- GeorgeClark - 23 Feb 2012

The early work I did on this has since been subsumed by other efforts, such as MongoDB. I don't think there's anything more useful that can be done here. So I'm rejecting it.

-- CrawfordCurrie - 09 Mar 2012
Topic revision: r11 - 01 Nov 2012, SvenDowideit
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy