You are here: Foswiki>Tasks Web>Item9717 (19 Mar 2011, MichaelDaum)Edit Attach

Item9717: Foswiki cache doesn't always update after save so SEARCH for topics will almost always be out of date.

pencil
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: FoswikiCache, SEARCH
Branches:
Reported By: LarsEik
Waiting For:
Last Change By: MichaelDaum
On two separate servers I have a problem with the cache not showing the updated text in a topic after save. One server freshly installed and one snapshotted and upgraded (Ubuntu 10.04 and 8.04). I also turned off mod_fcgid and deleted tmp/cache on the server. Tried default and BerkeleyDB, with and without httpcompress. I haven't established a definite pattern but I can say that the page is actually stored correct because it shows right when doing topicurl?refresh=on. So I hope it's not only me experiencing this.

-- LarsEik - 19 Sep 2010

For what it's worth, I seem to recall MichaelDaum talking about MemoryLRU, but I can't recall if it was because it had problems or because it worked better.

To be honest I don't think the cache has had much testing. Michael, what are your feelings on the status of the cache feature?

-- PaulHarvey - 20 Sep 2010

Thanks for the hint. After a little testing on my laptop svn install (with lighhtpd) and my test server with mod_fcgid it looks like the svn install works better with MemoryLRU. It is still not ok on the test server. This cache is very important imho, hope we can get it working.

-- LarsEik - 20 Sep 2010

Serious release blocker. This needs to be fixed or we have to hide and disable the cache feature in 1.1.0

-- KennethLavrsen - 20 Sep 2010

Any more detailed information on the config being used and how to reproduce?

I've set up a fresh foswiki + fastcgi + foswiki cache ... can't see any problem with it.

For now I can't repro the error you are seeing.

Kenneth, did you experinece the same probme? Anybody else?

-- MichaelDaum - 20 Sep 2010

I am on training in the UK so I cannot test at the moment. Only have access via iPhone and only rarely. When I return I will naturally test. Knowing the entire environment that triggers the problem is important.

-- KennethLavrsen - 20 Sep 2010

I can't reproduce with pseudo trunk + fcgid. Will try the beta1 at work.

-- PaulHarvey - 20 Sep 2010

I believe it works on my trunk svn install running lighhtpd with fastcgi. Not se easy to setup but good help in the fstcgiengincontrib topic and google of course. Thanks for hangin' in with me Paul smile A bit worried since Michael couldn't reproduce with same setup. Well, I've done stuff wrong before so...

-- LarsEik - 20 Sep 2010

On my svn trunk pseudo install I cannot reproduce either with apache and fcgid. So from where I sit, it is an issue on beta1. Have to think about what to test next. If I get time I might reinstall a server from scratch again...

-- LarsEik - 20 Sep 2010

There was a significant number of missing files in the beta due to MANIFEST issues. I should have them solved on Release01x01 branch now. This may be worth checking first

-- KennethLavrsen - 20 Sep 2010

As this bug is not reproducible, I lower it to normal so that it doesn't block the release. If there's no reliable way to reproduce it, I will no-action it later.

-- MichaelDaum - 22 Sep 2010

I managed to check out and setup from the Release01x01 and that specific issue is gone. One thing to mention is that Sandbox/WebHome maybe should default be in webdependencies. After creating a new topic in Sandbox you don't easily find it next time you visit Sandbox, unless you do ?refresh=on.

-- LarsEik - 22 Sep 2010

That's what dirty areas are for. The SEARCH needs to be wrapped into <dirtyarea>...</dirtyarea> to force this area to be rendered on each request.

-- MichaelDaum - 23 Sep 2010

The only place the cache really gives something is with large searches. I am not sure I understand the dirty area thing yet.

If you wrap a search into a dirtyarea does this mean that the cache is not used for displaying this area?

I have not noticed much performance improvement using the cache for plain simple topics with just a few hundres words and maybe a table.

When the cache is used for a search among a large population of topics and return a lot of hits the cache is a significant improvement. But if the cache is not updated when you save a topic, these searches will often be out of date. How do we use this cache in an efficient way then?

I am a bit confused about this feature now.

-- KennethLavrsen - 23 Sep 2010

Dirty areas specify fragments of a page to be excluded from caching while the rest of the page is alredy pre-computed.

The content of all dirty areas will be rendered freshly on each page hit and inserted into the rest of the page at the proper location on each request.

-- MichaelDaum - 23 Sep 2010

What happens to content that is not in a dirty area? When is this refreshed? It is not completely clear from http://trunk.foswiki.org/System/PageCaching.

-- ArthurClemens - 25 Sep 2010

Content outside of a dirty area is cached. It is refreshed when one of the dependencies of a topic fires. The page cache keeps track of a dependency graph among all topics by remembering the read() operations performed during the process of rendering a page. This kind of deep dependency tracking is only possible leveraging the internal knowledge of a cms and thus is outside the scope of normal reverse proxies.

-- MichaelDaum - 26 Sep 2010

Michael. Please try that explanation again. I did not understand a word and I do not understand where reverse proxies and cms comes in. I do not have a cms or a reverse proxy. If I do I do not know.

What do we tell the users? When is the cache refreshed in plain language?

-- KennethLavrsen - 26 Sep 2010

This isn't so hard to understand.

Edit a page and it is refreshed so that you will always see the latest content.

Visit the page again and it will be fetched from cache instead of computing it yet again.

Any other page that INCLUDEs the newly changed page (or uses this page in some other way) will be removed from the cache so that it is refreshed next time somebody requests it.

This is done by tracking dependencies between all pages.

That's all.

Sorry for not being clear. I assumed that you all were aware with reverse proxying as a caching strategy for a cms.

-- MichaelDaum - 26 Sep 2010

So any page that contains a SEARCH will show old content unless I put all SEARCHes inside dirty areas!?!

That basically means the cache has little use in practical in my opinion.

If the cache is not updated when you save topics in general and these topics are later shown in searches then it practically means that you have to put all SEARCHes in dirty areas unless they are truely static in nature. It also means that all topics with a SEARCH needs some user interface to force a refresh.

A Foswiki site is full of SEARCHes everywhere and people make them everywhere and they have a good reason to believe the search is returning valid date by default.

For me this means the cache is very close to useless. And I am not even sure it should be shipped in 1.1 unless it is marked as experimental and EXPERT.

The old VarCachePlugin was much more useful. It did not cache content unless you maked a topic to be cached. For those pages you where you wanted large slow searches to be cached you could add some user interface to tell people to refresh the cache.

But if the cache can only be enabled globally or completely disabled and all pages with a SEARCH are cached and out of date, then I cannot see many situations where this cache can be used in practical life. We saw Lars getting confused about the function. We will be bombed with bug reports and support questions if we ship the cache working like this.

Raising to release blocker.

-- KennethLavrsen - 26 Sep 2010

When caching a SEARCH, the page will establish a dependency on every hit in the result set at the time the SEARCH was performed. When one of these changes, the page containing the SEARCH will be invalidated in the cache as a consequence.

Caching a SEARCH will indeed not show up-to-date results when the search criterion matches a new topic out of a sudden. Similarly, a page with an %SQL or a %HEADLINES will show the results cached at the first time.

This is a known drawback that can't be prevented by any caching what so ever in the world.

The means provided in the current implementation to work around this is to cache a page partially. This is achieved by excluding certain areas of it as being "dirty". These so called dirty areas are recomputed on every request, while the rest of the page remains static as long as possible. Compare this with a kind of dynamic templating where variables are inserted into a static corset. Our solution here is of course much more flexible as the non-dirty areas can be edited on the wiki as well and are cached and refreshed transparently.

The implementation that finally made it into Foswiki-1.1.0 dates back since 2001 when I was working at the university of hamburg. It was hacked into a TWiki beijing and since then is successfully running there. I constantly maintained this patch for all TWiki and Foswiki engines since then. It is running on a series of major public sites in a speedy cgi or fastcgi environment. This together with page caching gives nearly the same performance as static html files.

This kind of performance is not available otherwise for Foswiki or TWiki.

See the now 2 years old feature proposal.

-- MichaelDaum - 26 Sep 2010

QUOTE This is a known drawback that can't be prevented by any caching what so ever in the world. UNQUOTE

Not true. There are several possibilities

  • Invalidate all caches each time a topic is saved. Could be per web. This means that the cache will be more inefficient but still much metter than none. In practical use it would mean that 3-10 times per web the cache needs to be reloaded by a user looking at a topic. We've got that. See the docu - MD
  • Change the way the cache works in the first place so that it is only enabled on a page if you ask for it. Right now it caches everything and you have to turn off the cache by listing exceptions or adding dirty areas.
  • Make SEARCH an automatic dirty area - and add a new option to SEARCH where you can set cache="on" for SEARCHes that could very well work fine cached.

The whole point of a cache is to speed up things. And what slows down a Foswiki site the most are the applications that do formatted searches among many topics and returns many hits. If these are cached the way it works now the searches are too often not up to date. And if you make them dirty areas the whole point of the cache is lost.

In 1.1.0 context all we can do is document how the cache works and warn against the SEARCH issue.

In 1.2 context I want to see this feature enhanced to address the SEARCH issue. This will be done through some feature proposals. I looked at the code and I have some ideas.

-- KennethLavrsen - 26 Sep 2010

its pretty trivial to trigger a SEARCH as dirty when a new topic is created that will affect the result. What you do is evaluate all SEARCH expressions that are in the cache on that new topic.

This is essentially a simple thing to implement once ResultSets are implemented - but I am surprised that Micha's implementation (that he indicates is basically 9 years old!!) doesn't already do this.

-- SvenDowideit - 27 Sep 2010

It is really important that as a Foswiki developer you get into caching business a bit more.

For normal end-users (people editing a wiki page here and there) any caching must be transparent, not requiring any knowledge about the things going on behind the scene.

People able to write a SEARCH need to be aware of caching effects.

People developing plugins that drag in external content from another database (like HeadlinesPlugin, SqlPlugin, SoapPlugin, LdapNgPlugin or the like) will also need a knowledge about caching, at least be aware of potential caching effects when seeing not up-to-the-second fresh content.

So let me first make this clear: there is no issue with SEARCH or the page cache we have. This is a logical problem of caching!!!

Either you cache something or not. There's nothing in between. You will only get a caching effect when you deliver the same page at least twice. The saved efforts only pay off when the the first hit costs a lot, like complicated SEARCHes.

So two things can be done

(1) don't cache a SEARCH, neither any SQL, LDAP, HEADLINES macro whatsover.

(2) Sacricfice correctness, that is delay recomputation for a certain time by not delivering up-to-the-second fresh content.

Number (1) is a bad sledge hammer stance, as you avoid caching for those candidates that benefit the most.

Nearly all reverse proxies go with option (2). It is a very important strategy, i.e. for high end sites. They also try to auto-invalidate cache entries based on certain properties of the request. They can't look into the intra-dependencies of content bits, things that go on in the backend. That's what only the CMS itself can do by tracking the ingrediences needed to compute a page. That's why we do it inside Foswiki's core.

For intranet wikis this is different. You edit something and you want immediate results.These sites don't suffer from all sorts of traffic, so page caching doesn't pay off that much anyway.

Besides, wikis being used as, well wikis, they see a rather frequent rate of edits. So a deep dependency tracking like in our page cache implementation will fire a lot of dependencies very often. The probability to hit a page that has been computed already will go down significantly.

As you see there is a fine ballance here between requirements of a wiki which by nature allows vast changes of content, and caching in a transparent way. Basically this is not solvable 100%, but nearly, giving users the means to work around unwanted caching effects. What we have right now is the concept of dirty areas and manual WEBDEPENDENCIES to list those pages to be invalidated on every edit in a web. There are potentially more things that we can give people: the most obvious one is limiting the time a page in cache is considered still valid. So adding a timestamp when the cache entry is considered outdated makes a lot of sense. Note, that this too only works around caching effects. It does not solve them, as that's impossible without switching off caching all together for this page.

I really hope that those people commenting on caching SEARCH will start to see the nature of the problem a bit clearer step by step.

-- MichaelDaum - 27 Sep 2010

I think we see it very clearly. But we obviously interpret the seriousness differently. Probably because we have smart users that use SEARCH everywhere.

I have now documented the way things works based on Michael's explanation. And put a warning to the admin so he can choose to enable the cache based on an informed decision between pros and cons.

But this does not close the bug. But it removes the urgency for 1.1.0.

For 1.1.1 or 1.2 at least we need additional modes to refresh the cache. And we need to think carefully HOW we add the additional modes.

It is indeed a balance. And we need to give the admin and the users more handles to choose the balance between accuracy and performance.

Since the bug remains open I also reassign it to major and 2.0 so it can get picked up there.

-- KennethLavrsen - 27 Sep 2010

Possible areas to improve transparency and to give the user (back) control:
  • show that a page or search results is cached, show the prognosed time/date when the cache will be cleared, and offer a link to refresh now
  • after editing a page, do the same with dependent topics (and offer a link to refresh those caches)

Not entirely foolproof, because an edit (or new topic!) might make the topic appear in one of the searches, so how to know which caches would need to be updated?

-- ArthurClemens - 28 Sep 2010

For now there isn't a timeout for cache entries though we plan to add one using a CACHEXPTIME preference variable . So this information can't be provided or only for a few pages. We shouldn't add a timer to the bottom for a second reason: it needs an update to the page fetched from cache; it can't be send over as is.

A bottom banner saying:

%IF{"{Cache}{Enabled} and context view and $CACHEABLE != 'off'" 
  then="<div class='foswikiPageCacheBanner'>
          %MAKETEXT{"This page has been cached at [_1]." args="%SERVERTIME%"}%
          %MAKETEXT{"Get a fresh version <a href='%SCRIPTURLPATH{view}%/%BASEWEB%/%BASETOPIC%?refresh=cache'>here</a>."}%
        </div>"
}%

makes more sense as it doesn't depend on an expiry date and doesn't need a rewrite on every request. That's an easy mod of the skin templates.

-- MichaelDaum - 29 Sep 2010

Documented use of preference variable CACHEABLE

-- MichaelDaum - 30 Sep 2010

I have tested on svn checkout and it seems to work very well now. SEARCH cache updates correct, INCLUDE cache updates correct. So only new topics needs a refresh and for us that is no problem. I think topics like Sandbox should have dirtyarea or cacheable correctly set default, so it doesn't confuse anyone enabling the cache. It just feels unnecessary that new test topics don't show on the front page after you just created them.

-- LarsEik - 25 Oct 2010

You are right. In the future we will ship default topics like Sandbox.WebHome in a way they behave as expected even with caching enabled, that is either flag it non-cacheble, add a dirty area or give the cache entry an expiry timer.

-- MichaelDaum - 26 Oct 2010

With Sandbox WebHome it is probably best with a hidden setting so beginners editing the topic to add static topic names do not get confused by this.

-- KennethLavrsen - 26 Oct 2010

All of the Checkins for this task are committed prior to 1.1.2 and are in the release branch... Except for the last one - distro:9868935f9e03. Is this task a 1.1.2 task that should have been closed with a new task for Rev 9976, or should the changes be synced over and released in 1.1.3?

-- GeorgeClark - 12 Mar 2011

All of these checkins (except one as far as I see) is documentation. The last checkin is implementing a new caching feature that only is implemented on trunk, not on the release branch. This is by purpose.

-- MichaelDaum - 12 Mar 2011

Closing this one. Any further development will be done using extra task items.

-- MichaelDaum - 19 Mar 2011
 
Topic revision: r50 - 19 Mar 2011, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy