Priority: Normal
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: FoswikiStore, Performance
Branches: master
Background
As of
Item11091 (specifically,
distro:5a79947c9bfd), Foswiki 1.1.4 is less trusting of the TOPICINFO line in
.txt
files when the
.txt
is newer than the
.txt,v
.
In this situation, Foswiki 1.1.4 transparently adjusts the TOPICINFO line 'on the fly' as follows:
-
info.date
is obtained from the filesystem last-modified datestamp of the .txt
-
info.version
is obtained by digging into the ,v
file and adding 1
-
info.author
becomes 'UnknownUser'
The reason for this re-writing is that the
.txt
file must have been 'mauled' by an external process, and generally these do not correctly populate TOPICINFO with accurate information. This causes problems, see
Item11091 for details.
Problems
- Foswiki performance now suffers significantly, especially if there are many 'mauled'
.txt
files. For each one, Foswiki must spawn an RCS rlog
command, if using RcsWrap
. Using RcsLite
can mitigate the problem somewhat. See Item11476 for caveats.
- Item11473 (merged with this task) is a complaint about the new, surprising
info.author
being set to UnknownUser. This new behaviour may be unacceptable to some installations.
Work-arounds
- Set
{Store}{Implementation}
in configure
to RcsLite
. See Item11476 for caveats.
- If you are happy to emulate the Foswiki 1.1.3 behaviour (i.e. accept the TOPICINFO line of the mauled
.txt
files), use the touch
command to force the relevant .txt,v
file to have a later datestamp.
- To update the last-modified datestamp of all
txt,v
files in your installation, use something like: find /path/to/foswiki/data -type d -exec bash -c 'cd {} && touch *.txt,v' \;
- To update the last-modified datestamp of only those
txt,v
files which aren't in sync with their .txt
cache, use something like: perl -MFile::Find=find -wle'find(sub{/^(.*.txt),vz/&&-f&&system("echo touch -f $1 $_")},@ARGV)' /path/to/foswiki/data
(courtesy OlivierRaginel)
- If your
.txt
files are mauled by an external script which you are able to change, you may wish to call the touch
command as an extra step at the end of your script, or even better: ensure that it leaves .txt
files with an accurate TOPICINFO line (increment version number, update its date
epoch), then do an RCS checkin to update the .txt,v
file properly
(Extraneous comments removed and may be found at
revision 10).
--
PaulHarvey - 26 Jan 2012
And switching the storage implementation to RcsLite gives even more speed improvements in the average case.
--
MichaelDaum - 25 Jan 2012
True; but it's not that simple - RcsLite is faster when Foswiki must process many ,v files in a given request, BUT it can have disasterously poor worst-case performance when ,v files get large, Eg. on large attachment files - for two reasons - firstly, RcsLite loads entire ,v files into memory, and secondly, the external RCS binaries are written in C, so their raw throughput is much greater than any PurePerl solution.
We should really work on a hybrid VC store to get the best of both worlds (especially the rlog case to get current version number, which should only require reading the first few lines of a ,v file).
--
PaulHarvey - 25 Jan 2012
I don't see a reason why RcsLite must load all revisions at once, not even when this thing was hybrid. That's a bug.
As most serious foswikis are running in a persistent perl environment (and will even more once foswiki has converted to PSGI), there shall be no more forking of an external rcs helper tool any more at all, even for whatever large histories there are.
Instead, RcsLite needs fixing.
Only when it turns out that fixing RcsLite is impossible not to operate as inefficient in the worst case scenario as it seems to be doing right now, should we think about complicating things even further and make the code hybrid, what ever unknown performance behavior that entails in itself.
For now I can't confirm any performance problems using RcsLite. Much more on the contrary.
A normal foswiki has got - let me guess - approximately 5 revs per topic and 1.5 revs for attachments on average. These normal foswikis will only profit from switching to RcsLite
right now. That's a low hanging fruit and a GoodThing™ to do as people don't have to wait for us hackers to come along with even better code.
And therefore RcsLite should be the default.
--
MichaelDaum - 25 Jan 2012
Except I've talked an IRC user or two who had tried RcsLite, and reverted back again because they had one single important file with massive history that would cause fcgid timeout.
WebStatistics is a good example of where wrap is faster than lite.
I agree though, we can fix RcsLite
--
PaulHarvey - 25 Jan 2012
Created
Item11476 for RcsLite concerns. This task needs to focus on problems & solutions involving performance when
.txt
is mauled
--
PaulHarvey - 26 Jan 2012
Made
Item11476 as urgent as this one.
--
MichaelDaum - 26 Jan 2012
I have re-written and re-titled this task so we can merge & close
Item11473
--
PaulHarvey - 26 Jan 2012
The current behaviour is
correct. If an external processes damages .txt, then it
is the UnknownUser who performed that edit.
There are adequate solutions to this problem - touching ,v files, making external process check in etc - that I feel this should neither be a 1.1.5 release blocker, nor even a report - except insofar as the performance of RcsLite is poor. So I changed the title from "1.1.4 is slower and shows
info.author
as 'UnknownUser' when
.txt
is mauled by an external process" to what it is now, and re-assigned to 1.2.
--
CrawfordCurrie - 09 Mar 2012
RcsLite
performance was being addressed in
Item11476. Will you close that as duplicate? Or this one?
To say that this doesn't even deserve a report ignores the fact that
this has been a support problem. Many users have been impacted by this.
The new behaviour may be
correct but the new behaviour is
new and we need to educate people about this better. At the very least we need to ship a dedicated System FAQ item.
--
PaulHarvey - 10 Mar 2012
There is a comment / SMELL in
RcsLite:
# SMELL: This code uses the log field for the checkin comment. This field is alongside the actual text
# of the revision, and is not recorded in the history. This is a PITA because it means the comment field
# can't be retrieved without reading up to the text change for the version requested - even though foswiki
# doesn't actually use that part of the info record for anything much. We could rework the store API to
# separate the log info, but it would be a lot of work. Using this constant you can ignore the log info in
# getInfo calls. The tests will fail, but the core will run a lot faster.
use constant CAN_IGNORE_COMMENT => 0; # 1
Should we document this, or maybe even run this way by default. If we can stop reading the rcs file before getting into the body of the diff, it would seem that would be a huge boost.
--
GeorgeClark - 17 Jun 2014