Foswiki on GitHub is open for business! Next release meeting: Monday September 29, 1300Z

Item11091: RCS VC needs to handle mauled .txt better

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Urgent Closed Engine FoswikiStore  
This is specific to the RcsWrap and RcsLite store modules.

When an external agency (e.g. vi) modifies a .txt file, Foswiki has a hard time recovering. There is a high risk of lost data, lost history, and corruption, because the core has far too much knowledge of the cache. The goal of this work is to push all knowledge of the .txt / .txt,v dichotomy below the store horizon by using unit tests to verify correct behaviour of the store.

(Note that where we refer to .txt below, that equally applies to an attachment, though obviously TOPICINFO is not an issue there)

An RCS-controlled file may be in one of four possible states:
  1. up to date means the .txt and .txt,v both exist, and are consistent
  2. inconsistent means the .txt is newer than the .txt,v
  3. no history means the .txt exists but there is no .txt,v
  4. no txt means the .txt,v exists but there is no .txt

up to date is the state the code currently assumes for everything. After discussion on IRC we have decided no text is a non-topic, as without a .txt the topic is not searchable and therefore to all intents and purposes does not exist.

Note that if there is no .txt but a .txt,v already exists when a topic is edited/created, the edit proceeds on empty content, but when the topic is saved the original history is intact. I think on balance that this is desirable behaviour, as it allows clean recovery when a .txt is accidentally deleted on the server.

So the states we need to focus on for testing are no history and inconsistent.

The following Foswiki::Store methods require special handling.

Method no history inconsistent
readTopic correct TOPICINFO correct TOPICINFO
getRevisionHistory return [1] load rev history and add [N +1]
getNextRevision return 1 return N+2
getRevisionDiff checkin checkin
getVersionInfo TOPICINFO for author, rev 1 and current date unknown user, rev N+1, current date
getRevisionAtTime checkin
saveAttachment checkin
saveTopic checkin
repRev abort and handle as saveTopic

All other existing tests need to be run on the inconsistent and no history cases to ensure correct functioning.

Notes on the tests:
  1. A topic with no .txt,v will always be rev 1, irrespective of the TOPICINFO. However the author and date fields will be respected for getRevisionInfo.
  2. 'checkin' of a topic means
    • META:TOPICINFO in the mauled .txt is rewritten/added before save; the version is set to 1, and the user to the unknown user (as defined by the BaseUserMapping)
  3. revision info is requested using getVersionInfo
    • If state is no history and .txt has TOPICINFO, that is used to get author and date. Otherwise author is set to the unknown user, date is set to the current date and rev to 1
    • if state is inconsistent then TOPICINFO is ignored and author is set to the unknown user, date to the current date and version to N+1
  4. Existing code should fail these tests.
  5. Tests are being added to RcsTests.

-- CrawfordCurrie

Discussion moved here from Item10993

Question: when checking in a topic that has had it's .txt modified, what user should be used? Choices are:
  1. The user from the TOPICINFO
    • If there is one
    • And if they are a real user (how can you check?)
  2. The unknown user (or some other user known to the BaseUserMapping)
  3. A CUID nominated in a configuration setting (defaulting to the unknown user)

IRC conclusion - some "OfflineUser"

-- CrawfordCurrie - 30 Aug 2011

See also:

  • Item10427: rev details lost if .txt file is edited externally
  • Item10390: Meta documented as supporting missing revs ... doesn't
  • Item9969: REVINFO and $meta->getRevisionInfo() has issues when the ,v file is re-built after it was lost.
  • Item10563: Task updates by svn commits breaks compare

-- GeorgeClark - 30 Aug 2011

Many thanks, George. I have trawled these other reports and updated the test set above to reflect the requirements.

Note that I'm very concerned about the definition of "newer" because it requires a stat()[9] to evaluate it (two, in fact)/ IME stat is not fully supported on all platforms, and some file systems get it wrong. If anyone with more experience can advise, I'd be grateful.

-- CrawfordCurrie - 31 Aug 2011

Did the above cover the case of the first change of an existing topic without history? It's really important to checkin version 1 (before change) and version 2 (after edit) or we go back to being unable to revert any changes to shipped System topics.

As you mentioned in one of the other tasks, rev 0 was wrong and and a pretty bad kludge, but given the current implementation it was the only way I could find to trick store into saving both rev 1 & 2. Otherwise the "current" rev 1 - without history - ends up accumulating the initial changes. This needs to be covered in the tests. The shipped AdminGroup, any of the WebNotify topics, WebHome, etc. are all particularly at risk. Another solution would be to go back to shipping rcs files to side-step the issue.

-- GeorgeClark - 31 Aug 2011

Yes, I know. The two cases, topic with no history and changed topic with an inconsistent history, both need to be tested and both require some delicate handling.

-- CrawfordCurrie - 04 Sep 2011

I have re-coded the VC store to handle the cases where a .txt has no corresponding .txt,v (no history) and a .txt is newer than the corresponding .txt,v (inconsistent history). I have added unit tests to cover these cases.

Note that there is a potential race condition where a .txt is not created by RCS until the clock has ticked over after creating the .txt,v. I'm not sure if this is possible, but it's a risk.

In the end the changes were almost entirely localised in the VC store; no other APIs or core code were touched, so I decided to merge the changes across to the patch as well.

-- CrawfordCurrie - 11 Sep 2011

Crawford, re your note about the fixes not working if file system doesn't set modification times, Should we have a checker in configure that verifies modification times are being set correctly and reporting an error if not? (Though not sure what the action should be if times are not set).

-- GeorgeClark - 11 Sep 2011

Quite - I don't know what action to take either, except to advise the user to use a different file system. On reviewing my notes, the problems with stat() are not so much with stat()[9] (modification time) as with creation and access times not being updated. So I think we'll be OK.

-- CrawfordCurrie - 20 Sep 2011
 
Topic revision: r29 - 17 Dec 2011, GeorgeClark
 
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see CopyrightStatement. Creative Commons License