Item12477: Spurious entries being added to .changes

pencil
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: FoswikiStore
Branches: Item12477 master
Reported By: GeorgeClark
Waiting For:
Last Change By: GeorgeClark
mailnotify occasionally jumps way back in history, aggregating years of revisions into a notification.

For ex:

This is an automated e-mail from Foswiki.

New or changed topics in Foswiki.Extensions, since 21 Mar 2013 - 16:52:

Topics in Extensions web: Changed: (now 16:15) Changed by:
NatSkin 28 Mar 2013 - 15:48 - r3->r43 MichaelDaum
The Natural Skin


Description
NatSkin improves your Foswiki experience by bringing together some of the most useful

 

However in the topic history,

43         28 Mar 2013 - 16:12   MichaelDaum
...
38         08 Jan 2011 - 00:57   PaulHarvey
...
3         04 Nov 2008 - 16:46   MichaelDaum

See also Support.Question1281

-- GeorgeClark - 22 Apr 2013

Crawford, do you have any idea on why this might happen?

-- GeorgeClark - 20 Dec 2013

I did a low level trace and evetually determined that something is adding records to .changes that have spurious revision numbers. This is throwing off MailerContrib. The problem is not with the contrib, but the core, so I changed the assignation of this task and the headline from "MailerContrib occasionally jumps way back in history for notifications". Here's an example, taken from the Community web:
AssociationArticles   MichaelDaum   1395836777   1   
AssociationArticles   MichaelDaum   1395836777   49   
AssociationArticles   MichaelDaum   1395836973   1   
AssociationArticles   MichaelDaum   1395836974   49   minor, reprev
AssociationArticles   MichaelDaum   1395837011   49   minor, reprev
AssociationArticles   MichaelDaum   1395837035   49   minor, reprev
AssociationArticles   OliverKrueger   1395837098   50   
AssociationArticles   OliverKrueger   1395837186   50   minor, reprev
AssociationArticles   OliverKrueger   1395837558   50   minor, reprev
AssociationArticles   OliverKrueger   1395837579   50   minor, reprev
AssociationArticles   OliverKrueger   1395837596   50   minor, reprev
AssociationArticles   MichaelDaum   1395837620   1   
AssociationArticles   MichaelDaum   1395837620   51   
AssociationArticles   OliverKrueger   1395837735   1   
AssociationArticles   OliverKrueger   1395837736   52   
AssociationArticles   OliverKrueger   1395837756   0   
AssociationArticles   OliverKrueger   1395837757   52   minor, reprev

Item8460 added recoredChange calls to many operations on the store. However these operations are recorded either with a rev of 0 (which MailerContrib can always skip) but also with the rev of an attachment. It's impossible to tell from the change record which applies.

All the recordChanges calls must be reviewed, and sufficient (and correct) information passed to disambiguate them.

-- CrawfordCurrie - 27 Mar 2014

Marked the for Release 1.2, to get it into the release blocker list

-- GeorgeClark - 28 May 2014

What about recording attachments as something like AssociationArticles{TheAttachmentName} to record attachment names. And add a "more" field of "attachment" for the attachment changes.

It would then look like:
AssociationArticles   OliverKrueger   1395837596   50   minor, reprev
AssociationArticles{FoswikiArticles.tex}   MichaelDaum   1395837620   1   attachment
AssociationArticles   MichaelDaum   1395837620   51   
AssociationArticles{AnotherAttachment.txt}   OliverKrueger   1395837735   1   attachment
AssociationArticles   OliverKrueger   1395837736   52   

-- GeorgeClark - 05 Jun 2014

Something like that; I had thought of using a / separator. But that really is the tip of the iceberg; ever recordChange call has to be examined on it's own merits, and the version information selected appropriately.

-- CrawfordCurrie - 08 Jun 2014

Yeah I was thinking about delimiters, Since attachments might allow spaces, something that clearly delimits the filename is needed. And / is for subwebs. Even though .changes is specific for a web, it might be a bit confusing when someone reviews this later.

Ug. NameFilter allows { and } so we probably need to encode filename. File is tab delimited, so I guess the / is simplest even potentially confusing. We have to be a bit careful though. IIRC Sven did some work dealing with migration of ancient TWiki sites where "/" is legal in topic names. Migration assisted by disabling subweb support and removing / from NameFilter. I don't recall the exact solution, but there might be a trap waiting there.

-- GeorgeClark - 08 Jun 2014

Details on calls to recordChange() (RCS = Store/VC/Store.pm Plain = Store/Plainfile.pm )

Event Plain RCS cuid revision verb Details recorded
Auto-attach a file   x x revisionn=0 autoattach newmeta, newattachment (Note the typo in the revision key passed)
Move attachment x x x 0 update oldmeta, oldattachment, newmeta, newattachment
Copy attachment x x x 0 insert newmeta, newattachment
Move topic x x x y update handler(oldTopic),oldmeta, newmeta
  x x x y update handler(newtopic) oldmeta, newmeta
Move web x x x 0 update handler(newWeb) oldmeta, newmeta, more="moved from oldweb"
Save attachment x x x y (update -or- insert) newmeta, newattachment more=minor - plainfile only
Save topic x x x y (update -or- insert) newmeta, extra=minor?
repRev topic x x x y update newmeta, more="minor, reprev" (Plainfile includes oldmeta, more=minor,reprev,options or save )
delRev topic x x x y update newmeta ? delRev action isn't noted in the record.
remove topic or attachment x x x 0 remove oldmeta, oldattachment, more="Deleted Web.(Topic)?:(Attachment)?" Plainfile doesn't call for web. RCS skips web remove in the recordChange() routine.

Calls generally appear consistent between RCS and Plainfile, with a couple of minor differences noted,. Note found a bug in RCS store, the Auto-attach function mis-spelled revision, so it's undef instead of 0.

However the call to recordChange doesn't use all the fields.

RCS
The topic is picked up from $this->{topic} and is not taken from either new or old meta.
PlainFile
The topic is picked up from the _meta parameter, not taken from either new or oldmeta.

The only field recorded are: (Topic | . ), cuid, time, revision, more. Tab delimited.

Possible changes needed / inconsistencies:
  • Add Attachment name as a separate tabbed field.
  • Move Topic records 2 changes, Move Attachment, only records a single change.
  • delRev should indicate that the action happened.
  • Need to address PlainFile vs. RCS more= difference

Modify file format, tab delimited. Add the verb, attachment name, and minor flag / reprev/delrev commands as separate fields. Use a consistent format for any move operation, topic or attachment. Record two changes, a remove on the old topic/attachment and an insert on the new name.

Topic Attachment verb cUID time revision minor cmd more
Some topic   update someuser 1395837736 24      
Some topic Somefile insert someuser 1395837737 0      
Some topic   update admin 1395837738 23   delRev  
Some topic Somefile remove someuser 1395837738 3      
.   update admin 1395837739 0     Moved from OldWeb

-- GeorgeClark - 12 Jun 2014

Other considerations:
  • File locking. .changes is not locked.
  • File is read parsed and rewritten for every transaction. Consider move to tick_foswiki script

-- GeorgeClark - 12 Jun 2014

Running some quick tests, the whole recordChange is somewhat bogus. Both a combination of the recordChange() routine not recording information, and Store failing to pass complete information to recordChange.

Just from some quick testing, Delete attachment, doesn't pass the attachment name or any indication that it was a delete. Deleting an attachment should probably record 4 changes: remove attachment, update topic, insert attachment (trash), update topic.

What we probably need is some more thorough recordChange tests in StoreImplementation tests that makes every possible change, and verifies that useful information is recorded.

for the eachChange processing there are a couple of alternatives:
  • Add an API extension, request "version 2" to get all topic & attachment changes. Otherwise only return topic changes -or-
  • update the users of eachChange to filter out undesired changes like attachments.

-- GeorgeClark - 13 Jun 2014

In pondering this, we have an issue for any site sharing the store between two releases, (for example Release 1.1 and Trunk on foswiki.org.) If we fix this by changing the .changes file format in 1.2, we'll break processing of the shared .changes files.

Current format (tab delimited)
topic cUID timestamp rev minor, reprev
so a compatible format might be:
Topic cUID timestamp rev (more)
minor, reprev, delrev
verb Attachment

-- GeorgeClark - 18 Jul 2014

I've started checking this into Item12477 branch. My plans are:
  • tie "rev" to only ever be a topic rev, never an attachment.
  • Add Attachment / attchment rev
  • 3 "verbs" Update, Insert, Remove Rename represented by two changes remove & insert
  • add "Other web, topic, attachment & rev representing the from or to topic info for move, rename
  • consistently create 2 records for any move / rename, within or between webs. Currently some operations are represented by a single change record. So a rename would result in:
    • remove Topic rev Attachment rev (other points to new location)
    • insert Topci rev Attachment rev (other points to from location)
    • a create or remove without the "other" present would represent a new toipic/attachment or a delete.

Topic cUID Topic rev more (minor delRev repRev) verb Attachment attachRev Other Web.Topic OtherAttachment OtherAttachmentRev

-- GeorgeClark - 05 Dec 2014

OK, bottom line is that these recordChange calls are in the wrong place. It should not be up to the store implementation whether to record a change or not; it should just manage the recording.

The recordChange calls need to move to Meta, so they are done consistently for all stores. Not sure yet how to handle compatibility, but like as not I'll simply disable in-store calls to recordChange for 1.2

-- CrawfordCurrie - 17 Dec 2014

Done. Note that the format of the .changes file has changed - it is now encoded using JSON (selected JSON rather than XML to avoid adding dependencies). (RCSStoreContrib and PlainFileContrib). Old format files should still be read by the code. Third-party processors of the .changes may have problems, though (it's no longer emenable to grep, for example). George, you may want to review this, make sure it's compatible with your anti-spam efforts.

-- CrawfordCurrie - 19 Dec 2014

I don't think it will impact the AntiSpam changes. Anything web based uses our official APIs. And the plugin does't proces the .changes files. However I frequently grep the .changes manually looking for activity from shell when cleaning up spammers. This is going to be rather inconvenient I suspect. My current spam cleanup runs:
find /path/to/foswiki.org/data -name .changes -exec grep -H $1 {} \;
I'm not sure how I'll be able to quickly find all changes made by a user across all webs. Suggestions welcomed.

-- GeorgeClark - 20 Dec 2014

cat .changes |  perl -e 'use JSON; print join("\n", map {JSON::to_json($_) } @{JSON::from_json(<>)})' | grep might work

-- CrawfordCurrie - 20 Dec 2014
 
Topic revision: r23 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy