You are here: Foswiki>Development Web>DatabaseStore (18 Jan 2016, CrawfordCurrie)Edit Attach

I've just made a semi-functioning Database Store for TWiki

It can create a new web stored in a DB using a file-based web (ie ManagingWebs), create and edit topics, topic, text and regex search. All on MySQL. Its based on cpan's DBIx, so there are loads of dependancies, and as i'm using very simplistic SQL at the moment, should work on many of the DBD Drivers that exist.

However. There is a lot of work remaining before it can be released to testing, including

(as usual) alot of Core changes,
more generic ways to migrate webs between Stores,
dealing with attachments

and the biggest: making the SQL-Store worthwhile

What I mean by this, is that while TWiki's main searching facility is using regex's, there will be alot less scope for speeding up queries, through the use of decent schema's. I'm currently resourting to replacing most open($dataDir/$web/$topic.txt) code with somthing similar to SELECT text FROM Topics WHERE web=$web AND topic=$topic and then piping that line by line into the PurePerl search (ie, slow as). (ok, its alot more complex than that, but you get the idea).

Clearly, the DatabaseStore will now be simple for users to see a benifit from - the query search will be able to be converted directly into an SQL query.

as I work on this I will be able to judge if I will make this a Contrib, as it will require quite a few more configuration options, or if it'll be in the core - unfortuanatly, alot of generic Core refactoring (see MultiStoreRefactor) will be required to make it all happen.

Questions, with answers that may change, but indicate current directions

How will the overall table layout be in the database?
- from memory, the current non-schema'd topic data table has the following columns (web, topic, metafield, data) where any individual topic is made up of muliplte rows
- I hope and expect to generate tables for each defined Form, but not in stage 1
How will the current know meta be represented in the database?
- see above
How does plugins that use meta work with this implementation?
- either using the TWiki::Meta object, or if it requires a raw txt representation, via a temporary raw topic file
How will regex searching work and how effective will this be? Will the implementation run flat text in parallel for compatibility?
- I am currently creating a raw topic text file that is then able to be processed using the pureperl search implementation
How will we resolve the compatibility issue with SEARCHes in meta data?
- by fixing whatever bugs caused that incompatibility - I intend for the old regex SEARCH to continue working - it is simply going to be a slower way to get the same result.
Which tables are indexed?
- none, until profiling (or implementation) shows which should be - it does not pay to optimise before you measure.
How are access rights handled in the DB design?
- same as now, in the layer above the store
How are revisions stored in the database?
- currently, there is a topic history table that contains all topic versions - thus being a slower access history that can later be moved to a seperate server.
Which database is this going to be implemented in
- None/All. Perl has several strong database abstraction layers, that will allow the user to make the choice. I have however begun this investigation using MySQL on Linux - though this may change due to the work I'm currently doing on DTrace.

Please add any other questions that you haver here

-- Contributors: SvenDowideit - 05 Apr 2007

Discussion

Don't they have built-in field regex search in MySql? Something like SELECT text FROM Topics WHERE web=$web AND topic=$topic AND text=myregex. That would be nice

-- StephaneLenclud - 05 Apr 2007

in mysql yes, but part of the point is to be portable, not tied to only mysql. Also, you don't get any of the indexing benefit, compared to being able to SELECT author, timestame FROM Topic WHERE ...

-- SvenDowideit - 05 Apr 2007

Interesting, so no fast regex for a generic DB store but a MySQL store would benefit from that and potentially provide better performance than generic DB and Perl regex search. Good to know.

-- StephaneLenclud - 06 Apr 2007

You're missing the third and more important option. DON'T use regex for most queries - because TWiki uses regex SEARCHs mostly innapropriatly, and get an even bigger speedup.

any regex search is going to be horribly slow when compared to doing data-wiki queries on a properly schema'd database. And as TWikiForms define a schema, (as does the absence of a TWikiForm), the most efficient way to query would be SELECT TopicSummary from ChangeProposalForms where CurrentState = 'Legacy.UnderConstruction'.

I have to be honest, and say that this is the design for TWikiForms that we were working on when I was at UNISYS, but unfortunatly we were still spec-ing the data inheritance model for nesting TWikiForms when John implemented what we have today.

-- SvenDowideit - 06 Apr 2007

Even more interesting. That's what we want. That should make TWiki application scalable and TWiki just provide an incredibly flexible rendering front-end for database application. If you need help or feedback on design or implementation feel free to ask.

-- StephaneLenclud - 08 Apr 2007

Surely this proposal - seen as the big picture - is not going to be accepted without being broken down to a much more detailed spec.

So my proposal as put forward at FreetownReleaseMeeting2007x04x09 is for this to be part of a TWiki 5.0 release theme. I think this kind of work is an integral part of a solution to make TWiki more scalable and I support very much the work proposed by Sven. My name in the concern field is only to flag to our release twiki application that the 14-rule is not applicable because spec is incomplete.

-- KennethLavrsen - 23 Apr 2007

dunno Kenneth, its one of those, all or nothing things. If you don't accept the big picture, there's not much hope of someone devoting the time to do all the fiddly little things needed to make it work.

The Spec is complete. It is: do everything needed to make TWiki Core capable of working like it does now, plus optionally to store its topics in a database.

-- SvenDowideit - 23 Apr 2007

That sounds like a reasonable spec. I'll assume some similar configuration to the DatabasePlugin for specifying the database. All existing test cases must of course pass, and additional test cases for the DB store be added. I for one can wait for the checkins to see the changes to the core.

-- CrawfordCurrie - 23 Apr 2007

The spec I miss is.

(Sven moved specific Q's above 06 Jan 2007)

These are just the open spec questions I can think of in two minutes.

"The big picture" seen as - the general acceptance of a better storage model based on a database design is what I tried to propose as a 5.0 release theme. So if this is the acceptance you are looking for Sven - then you have my vote already. But before we start coding and coding and coding we need to get structure to the basic architecture. Otherwise we end up with something that does not work or works in a very limited fashion. This is too important to get started wrong.

-- KennethLavrsen - 23 Apr 2007

I agree with Kenneth. That's such a big chunk and such an important move for TWiki that the CommittedDeveloper (that's you Sven

) should really get the community involved with the architecture and design of that solution. Although the functional specification is simply "Make a database store" the final product will most certainly gain in quality and acceptance by having proper technical specifications written down and carefully discussed over with peers.

-- StephaneLenclud - 24 Apr 2007

There are two ways to look at this, I guess. You can either write up a technical / architecture spec and invite comments, or you can focus on actually achieving something. My personal experience is that while I have often had valuable user feedback on specs, open forum discussion of architectural detail has rarely worked, because you end up spending so much time teaching people about the basic architecture and functionality of TWiki, or arguing about the irrelevancies that always seem to get brought up when people don't research their comments. Code, and especially tests, are the best possible spec. So, my approach would be to:

write up a "user spec", invite comments on that,
make a branch and write code. Those with enough existing architectural knowledge to contribute will review and comment,
comments in the code and tests will educate those without that background and help bring them up the curve.

Something that might change your approach would be if you had one or more committed developers who were actually prepared to help code and test; in that case, a technical spec might be a good idea.

-- CrawfordCurrie - 24 Apr 2007

Given that a database store would have several dependencies, why not add another one and substitute the regexp queries using a query language from a text-indexing library like DBIx::FullTextSearch?

This particular library has the "advantage" that the index can be stored in the same database as the text documents. I don't know how this may affects performance, though.

-- RafaelAlvarez - 13 May 2007

yep, and this will hopefully be able to be driven by users from SearchWithTWikiQueryLanguage

-- SvenDowideit - 14 May 2007

Target changed to GeorgetownRelease

-- KennethLavrsen - 03 Jun 2007

This is just something to think about...

I would like to add to the roadmap that multiple webserver instances running on separate hardware to work off the same database, as well as support for cluster databases (MySQL cluster and Oracle RAC). Also possibly set it up so that reads can be read from database replicas, and writes are committed to the master database.

This would address concerns regarding RAS for mission critical deployments. With a redundant pair of loadbalancer appliances in front of the web/app servers, this would address two issues. 1) HA and 2) Scaling.

Oh yeah memcached support as well.

-- BrianGupta - 07 Jun 2007

One thing you are going to find is that mysql, and sybase support case insensitive search, where Oracle does not. (I am guessing MsoftSQL does support it since it is based off the Sybase codebase)

What else. The advantage of MySQL is that you don't require a database.

Finally I would probably start with MySQL and Postgres. Adding commercial database support should come later.

In the admin interface for TWikiDBPlugin you should be able to do simple database administration. This would include dumps, as well as creating indexes.

One thing it may help to share your schema diagram, as it would allow others with some database experience to assist.

-- BrianGupta - 07 Jun 2007

As I'm writing this using DBIx, a Perl abstraction layer to talk to many database backends, and was using a Query generation layer too, I'm avoiding (as far as possible) hardcoding it to MySQL, or any other religion.

I also hope to avoid making database management tools - databases have them already - I'm not sure that having the few TWiki developers duplicate that is worth the loss of focus.

As far as schema - take a look at the raw twiki topic - That is whats determining the schema more than anything - especially as TWikiForms means dynamically defined schema's.

I'm not aiming for a fixed schema at the moment - I'm into code and database generation

But then thats one area I have some experience.

Though its true that whoever sponsors the work will get to determine what we focus on first.

-- SvenDowideit - 07 Jun 2007

Just a question from a non-dev: will this include or prepare for ORM, or is my question not relevant (I dont know too much about ORM - object relational mapping). But please share some thoughts if anyone can educate a simple mind.

I had a thought (probably silly) that with TWikiStandAlone with SQLite we could have a very capable TWiki and those with special needs or requirements could use another RDBMS. As TWikiStandAlone has ideas from Catalyst it also came the thoughts about ORM, which Catalyst has functionality for.

-- LarsEik - 02 Mar 2008

no, I don't think ORM is worth it in this context - though I too have spent time in Catalyst, its optimised for code development of applications, rather than structured document work like TWiki. It did cross my mind to re-implement TWiki using Catalyst, but

-- SvenDowideit - 18 Mar 2008

Sven, is there any chance we can get visibility on your work? I'm about to release two Plugins that may overlap with what you're doing, and I bet they are crude version of what you have.

-- RafaelAlvarez - 04 Aug 2008

I have to port the work to trunk, and last time i looked, trunk has unit test failures in the SEARCH code - but in the next week or so I'll be able to plan out some work in this area.

-- SvenDowideit - 05 Aug 2008

I would like this topic discussed at the Summit but since Sven is not physically there I would like to arrange that he can be there via phone.

I think this topic is so huge and to potentially live/die matter for TWiki that it should be well planed and defined so several developers can work in parallel on the different aspects.

This is for sure not an ignored feature request. On the contrary

-- KennethLavrsen - 17 Aug 2008

I would prefer a database storage from a management perspective. I simply sounds mor professional. I guess that most managers will think this way.

-- MartinSeibert - 18 Aug 2008

What does "a database storage from a management perspective" mean?

I'm very interested on this feature and IMHO there is space for a lot of experiments. We can have great performance improvements

-- GilmarSantosJr - 18 Aug 2008

Database storage can mean anything which is why we need more Codev work on this. Sven has for sure some important studies and code already but the point is not to add DB storage because it sounds cool but because done right we can gain a lot of performance for applications that uses searches and for TWikis with many topics and/or many users registered.

-- KennethLavrsen - 18 Aug 2008

This hasn't stopped; Sven and I have been continuing to experiment. The latest work is represented by the DBIStoreContrib and MongoDBPlugin, each representing two subtly different approaches; the first using an SQL store, and mapping text queries to SQL, and the second using a text store, and mapping queries to text searches. FavioFlamingo has also done some initial work on a normalised SQL schema.

I class all of this work as still experimental.

-- CrawfordCurrie - 23 Oct 2010

I am now taking a bold step and committing this.

The DBIStoreContrib is right on the edge of being able to implement a full store. In its role as a cache it works pretty well, and just need to be extended to support history.

-- CrawfordCurrie - 07 Dec 2010

Set to Accepted.

-- CrawfordCurrie - 24 Feb 2012

Item9715 is closed and included in 2.0, is this proposal complete? Please update the status, and push out the Planned For release if more work remains.

-- GeorgeClark - 19 Nov 2015

Mostly implemented as an extension.

-- Main.CrawfordCurrie - 18 Jan 2016 - 18:28

ChangeProposalForm edit

TopicSummary	Database store, or database cache, for high performance
CurrentState	ImplementedAsExtension
CommittedDeveloper	CrawfordCurrie
ReasonForDecision	AcceptedBy14DayRule
DateOfCommitment	7 Dec 2010
ConcernRaisedBy
BugTracking	Tasks.Item9715
RelatedTopics
PlannedFor	2.0