Feature Proposal: Plugins need a working/temp file cleanup mechanism

Motivation

Garbage collection for plugin-created files is a sore point. Right now, every site is on its own coming up with some maintenance script - and administrators pretty much have to read the code of every plugin that they use to find out what they have to do. And do it again with every plugin release.

Some plugins create files in pub/; others in the working area. Some of this data is intended to persist; other data is per-session. File naming conventions vary. But if you do nothing, sooner or later either your disk fills, or you delete something you shouldn't have.

This is an unreasonable burden for the site administrator - especially if you aren't a perl coder.

Plugins can (and do) easily create working files, but there is no cleanup mechanism. This is a serious headache for site admins. See the May 2011 discussion under http://twiki.org/cgi-bin/view/Plugins/GaugePluginDev that brought this to a head.

Description and Documentation

See http://foswiki.org/Tasks/Item10780 where I put this first. The basic idea is that tick_foswiki.pl scans the loaded plugins list and calls a handler to allow the plugin to expire, compress, or otherwise manage its working/dynamic files. I presented a template routine for EmptyPlugin that documents how to do age-based and/or filename-based processing, as well as the (TWiki version) of the calling code.

The corresponding TWiki proposal is http://twiki.org/cgi-bin/view/Codev/PluginGarbageCollection The spelling is different but the idea is the same.

It's a few lines of code in the tick script that I think has high payback for site admins if the Plugins make use of it.

We ought to be able to do something that suits both groups.

Examples

I did this as a TWiki prototype for my version of VarCachePlugin. A production version would clean up empty web directories (when a web disappears due to rename, delete, or move). And would probably scan the text of existing topics to make sure that the %VARCACHE% macro is still there. sub pluginCleanup
{
    my( $twiki, $now ) = @_;

    my $webRE = TWiki::Func::getRegularExpression('webNameRegex');
    my $topicRE = TWiki::Func::getRegularExpression('wikiWordRegex');

    my $wa = TWiki::Func::getWorkArea($pluginName);

    foreach my $wf ( glob( "$wa/*/*_cache.{head,txt}" ) ) {

      my( $web, $topic, $type ) = $wf =~ m!^$wa/($webRE)/($topicRE)_cache\.(head|txt)$!;

      next unless( defined $web && defined $topic );

      unlink( $wf ) unless( TWiki::Func::topicExists( $web, $topic ) );
    }

}

Impact

The plugin working areas become self-managing, easing the burden on site admins to keep up with them and ending the proliferation of un-coordinated and unmaintainable cleanup scripts.

%WHATDOESITAFFECT%
edit

Implementation

-- Contributors: TimotheLitt - 23 May 2011

Discussion

I'm not raising an objection, but I don't really understand why we need this.

Extensions that create temp files should clean them up as good practice. Use the File::Temp UNLINK option to have files removed after use. Or when files need to be preserved for other use, clean up at the end of processing. Extensions should not be leaving behind clutter in the first place.

For persistent files that are related to topics, then they can be handled in a afterRename handler, or tracked and removed in an afterSaveHandler, so they are either moved or deleted as needed.

GenPDFAddOn and DirectedGraphPlugin use a lot of temporary files as well as persistent files in the extension work_area and persistent files cached in pub/ It was a bit of effort, but hopefully they are not leaving behind cruft in the temp directories. DGP keeps a map of dynamic attachments in the work_area, so it can match up before/after each run to remove stale attachments.

Do we need a new handler or do we need Tasks opened against extensions that leave stuff behind.

-- GeorgeClark - 23 May 2011

Fair observations, but here are some examples with my rationale:

GoogleWeatherPlugin - caches location-specific data for a period of time long enough to keep Google from blacklisting the requestor. Need to remove data for locations that time out and are no longer used. But they might be used again by some other topic before the required time between requests. So you can't hook rename/save - you need something timer-directed. I have other cases like this where webcam data is cached from servers that blacklist abusers.

Rename will handle rename, move, and move-to-trash (delete). But when the trash (which last I looked still needed a manual empty script) is emptied, what callback is going to remove the stale data? Perhaps we need one...but currently admins just groan and add a cron cleanup script.

GaugePlugin - creates dynamic data in pub that has to remain for as long as a browser might do a refresh - but needs to go away eventually. The gory details are under negotiation (I'm trying to get the maintainer on TWiki to improve things), but it also lends itself to a timer-based approach.

VarCachePlugin - handles expiration of cached data only when a topic is viewed. It has no way to empty the cache at the expiration time for a topic that's not been viewed for, say a month. You pretty much need a timer-based mechanism if you want to free the disk space.

I agree that plugins/extensions shouldn't leave stuff behind unnecessarily - and it's fair to open tasks against those that do. The ones you mentioned were deficient; I'm glad they were fixed. But I think some do require a timer mechanism because there is no (guaranteed) webserver event to trigger GC. (Note that tick_foswiki handles the login session cache and stale edit locks. Any plugin that does something that looks like this will also want to be timer driver. Maybe someday we'll have a semi-persistent editor undo buffer - that might want to be removed (by some admins) if no login for a few months.

This proposal provides a standard way - driven by an existing mechanism - to handle the cleanup cases that want/need time-driven garbage collection. It keeps the knowledge about what the script does in the plugin - where it belongs.

So I think this proposal serves a useful purpose. I agree that when possible, it's better to have plugins clean up after each request. When it's not - or when it's very expensive or inconvenient, GC is the alternative.

This proposal is a pretty simple, clean way to provide GC to plugins that need it. No plugin is required to use it.

I know on my (small) site, it will get rid of easily a half-dozen heuristic GC scripts.

On the other hand, it may encourage lazy plugin authors to clean-up this way rather than aggressively cleaning up for each request. That's not optimal, but it beats the current state where lazy plugin authors don't clean up at all smile

-- TimotheLitt - 23 May 2011

Providing a new handler which any extension can define which is called by the tick script. That is a great idea.

It is so silly to waste runtime as part of views, edits and saves to clean garbage. It is much better to let a cron job do this in the background (scheduler task in Windows).

Having ONE standard tick script that can run a few times per day is not a big deal to setup.

And then the plugin author can write a simple routine to remove garbage.

Same handler can also be used by an extension to send emails, and any other thing that needs to run regularly. It does not have to be limited to garbage handling.

With a little care this can become really useful.

-- KennethLavrsen - 23 May 2011

Kenneth -- Exactly, although I do think that simple cleanup shouldn't be defered. And tick_foswiki is already part of the distribution and should already be running in every installation. And it already creates a session. That's why I picked it.

If we put other functions into this handler, we might want to have the cron job run at a relatively high frequency - maybe every 15 or 30 minutes, and pass the handler the time since the last run as well as the current time. Then the handler can decide if enough time has passed for it to run. (e.g. something that reads a lot of topics might want to run once a week, while simple garbage collection might run daily and some e-mails might be every 15 minutes....)

Thanks for all the thoughts - They'll refine the prototype.

-- TimotheLitt - 24 May 2011

Yes, I like this idea. MartinCleaver proposed something similar some years ago, but no-one ever implemented it. tick_foswiki is exactly the right place for this to be called.

It would be best if you could support registration of a listener. For example,

StaticMethod registerPulseHandler(\&handler, $schedule))

where handler is the handler function and $schedule is the schedule on which the handler function should be called. That would allow a plugin to register different handlers on different (and even user-defined) schedules. There are CPAN modules for handling cron scripts that support the specification of schedules (the default would be to simply call the handler on every pulse of tick_foswiki).

I believe there are other feature requests/tasks covering this same topic; it would be worth having a search.

(The problem with letting the handler decide whether it's time to be called is that it would require it to remember when it was last called > yet another timestamp file. A cron schedule could be easily and consistently specified in =configure, and would support non-linear schedules)

If we increase the calling frequency of tick_foswiki we might need to consider a daemon version that keeps the perl interpreter (and foswiki) in memory. Cross that bridge when we come to it.

-- CrawfordCurrie - 24 May 2011

You mean CPAN:Schedule::Cron? Doesn't seem to be actively maintained, but I see the possibilities. Do you have experience with this (or some other)?

I had already thought of the daemon version, though I was aiming to keep things simple.

It seems to me that the fancy scheduling can be done as a second phase, since the default would just be to call the named handler (pluginCleanup) that I started with on every tick. I think that would handle the common cases with something that's easy to backport and easy to use.

A plugin wanting fancy scheduling wouldn't have the pluginCleanup function; instead it would register a schedule in initPlugin. Not wanting to waste cycles doing that normally, I'd suggest a context variable (perhaps 'cleanup_active') that tells plugins that they can register.

I'm a bit cautious about adding schedules to configure - perhaps they're OK as expert over-rides. We already have an overwhelming amount of configurability. It's important that plugins have a sensible set of defaults so that they normally just plug and play.

A related consideration is synchronization. Schedule::Cron can fork - which is good for performance, but ripe for interaction bugs. But even if run non-forking, we still have periodic events in tick_foswiki running against webserver events. At the risk of adding complexity, perhaps we also need an api for locking a plugin's persistent data. Something like a shared(read,view)/exclusive(create,delete,write) lock on the plugin.pm file taken explicitly by the plugin during normal operations, and implicitly locked exclusive by tick_foswiki around the callbacks? Wrapped in a "lockPersistentData( 'read' | 'write')" syntax...

I've stumbled across a number of cases of plugins that don't understand concurrency issues - while they are broken today, periodic events will make things worse. At least this is an opportunity to raise consciousness by providing an API. Can someone take that to a separate feature proposal?

I'll try to run some experiments with Schedule::Cron in the next few days and see if it feels viable.

-- TimotheLitt - 24 May 2011

Registering a pulse handler when not running in the pulse service is a NOP; no need to explain the context variable to plugin authors, all they need to know is that registerPulseHandler only does something when called by a pulse service.

BTW I want to get away from "named handlers" in the plugin sense and move towards a listener/event architecture for plugins. So again, I ask you not to add a "pluginCleanup" handler, but instead support registering arbitrary functions as pulse handlers.

Schedule only need to be added to configure if a plugin needs admin configurable schedules. I'm sure much of the time the plugin author will just want to say "on every tick" - or perhaps, "no more often than once a week".

-- CrawfordCurrie - 24 May 2011

In principle, a plugin may require more than one type of processing, and the schedules for each may be different. It is much easier to do this, and also much easier to specify the schedule for each handler, if we support registering arbitrary functions as pulse handlers.

The plugin's schedules might be configurable via configure, which means that the admin could set them to be the same. This means we should not rely on the schedules being unique. So I suggest giving names to handlers, so that the pulse-scheduler has a unique identifier for each one. The scheduler should be combine the caller's package with the given identifier so that different plugins may use the same identifiers without clashing.

Something like this:
# Shuffle the deck on every tick
Foswiki::Func::registerPulseHandler( 'shuffle tags', \&pulseExampleShuffler ); 

# Toss out old stuff based on the admin's schedule. The default is "do it daily"
Foswiki::Func::registerPulseHandler( 'clean up', \&pulseExampleGC, $Foswiki::cfg{Plugins}{MyPlugin}{GarbageCollectionSchedule} || '1 0 * * *' );

-- MichaelTempest - 25 May 2011

I have built a working prototype of a timed task daemon - working for TWIiki, that is :-(. But that should be good enough for some feedback and to test the theories.

I didn't take all the advice, but you should recognize what's here. (He who does gets extra votes...) I think it's a reasonable start.

One of the constraints on this prototype was that I did not want to modify any core files, which some of your suggestions would require.

In the attached tar file, you'll find 3 files. Here's how to get started.

First, find your friendly TWiki test system. (I don't have a Foswiki running yet, and in any case I want them to accept ti too.)

cpan install R/RO/ROLAND/Schedule-Cron-1.01_1.tar.gz note this is the latest "Developer" release; the standard release wouldn't install and has bugs.

cd to your twiki root, and unpack the tar file.

mv your tick_twiki.pl file to something like standard_tick_twiki.pl.

move (or link) etc/sysconfig/TWiki to the real etc/sysconfig. Make sure it's owned by your webserver. Edit it to match your configuration.

Create softlinks to tools/experimental_tick_twiki.pl from /etc/init.d/TWiki and tools/tick_twiki.pl

Run chkconfig -add TWiki (I haven't tested this yet, but it should work.)

You should be in business. You may want to adjust the frequency of your tick_twiki runs - all they do is restart the daemon if it's crashed. So every 30 mins is probably reasonable. But then, so is not running it at all smile

You can run /etc/init.d/TWiki status to verify. (start if chkconfig doesn't start it for you)

You can enable PeriodicTestPlugin - it does nothing useful, but does test the APIs. Feel free to try your own clients.

There are two mechanisms provided:
  • The rock-simple pluginCleanup mechanism (Crawford may persuade me to remove it later, but I like the simplicity for the writer, and it's consistent with the way plugins work now.)
  • A full named-task mechanism that you have to register in initPlugin.
The script runs as a daemon under the webserver account.

All schedules are vixiecron format - just the first 5 fields, NOT the command. The 6th field is optional and

An internal task handles the traditional tick_twiki functions, as well as the pluginCleanup. It is scheduled by $TWiki::cfg{CleanupSchedule}, which I am currently overriding to 0-59/2 * * * * 30 at the top of the file. A real default will be negotiated later - for now, if you comment this out, you'll get the traditional daily at midnight default.

API Highlights:
  • TWiki::Periodic is in scope when a plugin initializes under the daemon. Because I did not modify any core files, it is NOT in scope under the webserver.
  • AddTask takes a name, a sub ref, a schedule, and an arbitrary argument list. You automatically get your name and the session.
  • Schedule arguments can be:
    • Explict crontab time strings
    • A preference variable name - the scheduler will fetch it for you
    • omitted - you get the same schedule as the tick_twiki functions, which is $TWiki::cfg{CleanupSchedule} or a default. I suggest using a $TWiki::cfg{Plugins}{YourName}{FooSchedule}.
  • You can remove a task by name (DeleteTask), obtain the next time it will run (NextRuntime), replace its schedule(ReplaceSchedule), or delete it (DeleteTask).
  • I tried to keep these abstract enough that it might be possible to replace with a different schedular package, but who knows?
It should be trivial to migrate the mailnotify/webnotify scripts to tasks - that would be a good test.

There is considerable logging - start with -d for more, and you really DON"T want -v (I warned you). Look in the debug.log and warn*.log files.

If run under the perl debugger, you can set breakpoints in the daemon; you'll get an X-window when you hit them.

You can get a full listing of the execution queue from the command line using status dump. This signals the daemon, and writes to the debug log.

--help on the command line will give you a mini man page for the script.

This should be enough for reasonable experimentation. I know it has rough edges, and it probably has bugs. (What do you expect for a couple of hours of prototyping?)

I will qualify the task names with the caller's package in the next iteration; for now, do something like "$pluginName_".name.

I suppose configure should learn about entering and validating crontab time strings. I think that's quite different between the two forks, so I'm in no rush.

I'm not sure about making core changes to provide stub routines (probably in plugins.pm). For now, we can live with the context variable.

Do not start porting to Foswiki yet. It's not stable yet, and the TWiki folks need to have their say.

However, I do think it's at about 80% (maybe better) complete. I encourage you to play with it and also to separate your thoughts on functional defficiencies from tose on style. (Not that style isn't important, but it's not first on my list for this.)

Enjoy,

-- TimotheLitt - 25 May 2011

Sounds good, but I hope you will take on board what Michael and I have said about registering a handler (which is consistent with the existing registerTagHandler and registerRESTHandler) rather than having the hard-coded, only-one-function pluginCleanup approach, which is very limited. BTW pluginCleanup is not consistent with the rest of the plugin architecture, which uses handler functions to implement listeners installed at different positions in the rendering cycle (which has always been a PITA as each plugin can only register a single listener at each position). The pluginCleanup function is not a handler in this sense, so is out of band (and potentially confusing) for most plugin authors.

WRT functionality, I think you need to support the concept of different functions being applied on different schedules without requiring the plugin author to disentangle the schedule. A classic requirement for this is found in the mailer; we want to be able to mail out change details on a different schedule to mailing out digests, which have a different schedule to newsletters. At the moment we have to do this with separate cron jobs, which is error prone due to synchronisation issues.

-- CrawfordCurrie - 25 May 2011

The prototype already supports both models.
  • You can call TWiki::Periodic::AddTask as many times as you like to register as many tasks as you like. Each can have its own schedule. Or you can register the same subroutine multiple times with different schedules - as long as you use a different name each time. Names are qualified by caller's package. The only disentangling is writing the crontab time spec. I believe this is exactly what you asked for, except that there is no TWik::Periodic::AddTask under the webserver. (And I spell things differently.) As I noted, it should work well for the mailer - I had it in mind.
and/or

  • You can also have the one pluginCleanup function. It's as simple to understand as pluginInit - you define it and you're called on the admin schedule with everyone else. No configuration, you're on whatever schedule the system (or administrator) picks for tick_twiki maintenance.
Next pass will support specifying the wiki username... (The first pass runs as 'guest', which was an oversight.)

-- TimotheLitt - 25 May 2011

Username is fixed, logging improved, and the test plugin now has some more interesting examples. Not that they do anything, but they show the scheduling.

Here is some log output showing startup, the initial queue, and the intentional error. Perhaps some examples will do a better job of showing the flexibility than a long description.
Periodic Task(I)Wed May 25 08:38:05 2011: Schedule::Cron - Starting job 0 with ('initWiki','none',{'p' => '/var/www/servers/twiki/working/tick_daemon.pid','d' => 1},bless( {...}
Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 0-59/2 * * * * 30 TWiki::Plugins::PeriodicTestPlugin::cronTask1( 1,4,19 )
Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 15 8-17/2 * * 1-5 TWiki::Plugins::PeriodicTestPlugin::Mail( runmail,Mailer.Log )
Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 18 20 * Jul-Sep Sun,Sat TWiki::Plugins::PeriodicTestPlugin::News( runnews,News.Log )
Periodic Task(I)Wed May 25 08:38:06 2011: initWiki
Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 0-59/2 * * * * 30 TWiki::Periodic::TickTock( HASH(0x87ed2dc) )
Periodic Task[24320]: Event queue listing
Periodic Task[24320]: 0-59/2 * * * * 30 Next: Wed May 25 08:38:30 2011  - TWiki::Plugins::PeriodicTestPlugin::cronTask1 (session, 1, 4, 19)
Periodic Task[24320]: 15 8-17/2 * * 1-5 Next: Wed May 25 10:15:00 2011  - TWiki::Plugins::PeriodicTestPlugin::Mail (session, runmail, Mailer.Log)
Periodic Task[24320]: 18 20 * Jul-Sep Sun,Sat Next: Sat Jul  2 20:18:00 2011  - TWiki::Plugins::PeriodicTestPlugin::News (session, runnews, News.Log)
Periodic Task[24320]: 0-59/2 * * * * 30 Next: Wed May 25 08:38:30 2011  - TWiki::Periodic::TickTock (session, HASH(0x87ed2dc))
Periodic Task[24320]: End of event queue
Periodic Task(I)Wed May 25 08:38:06 2011: initWiki finished successfully
Periodic Task(I)Wed May 25 08:38:06 2011: Schedule::Cron - Finished job 0
Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Starting job 0 with ('TWiki::Plugins::PeriodicTestPlugin::cronTask1',bless( {...}
Periodic Task(I)Wed May 25 08:38:30 2011: TWiki::Plugins::PeriodicTestPlugin::cronTask1 finished successfully
Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Finished job 0
Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Starting job 3 with ('TWiki::Periodic::TickTock',bless( {...}
Periodic Task(I)Wed May 25 08:38:30 2011: Expire sessions
Periodic Task(I)Wed May 25 08:38:30 2011: Expire leases
Periodic Task(I)Wed May 25 08:38:30 2011: Cleanup plugins
Periodic Task(I)Wed May 25 08:38:30 2011: ReplaceSchedule: New schedule for TWiki::Plugins::PeriodicTestPlugin::cronTask1: 0-59 * * * * 10
Periodic Task(E)Wed May 25 08:38:30 2011: Schedule::Cron - Error within job 3: delete at /var/www/servers/twiki/lib/TWiki/Plugins/PeriodicTestPlugin.pm line 152.

Periodic Task(W)Wed May 25 08:38:30 2011: TWiki::Periodic::TickTock exited with status 1
Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Finished job 3

-- TimotheLitt - 25 May 2011

Cool! I reserve the right to dislike pluginCleanup (there are far too many plugin handlers already) but can live with it. Not sure why you had problems with the username; that should be specified in Config.spec for the extension, I guess. I do like the fact that you are trying to keep this out of the core at this stage, so it can be used with older releases, but I suspect we should consider integrating your work directly in the core (or as a default plugin) so it's available to everyone without the need to install an additional extension. Beyond that I can't really comment until we've seen the code.

-- CrawfordCurrie - 25 May 2011

Thanks. We all have our dislikes; I won't make a final call on pluginCleanup until I have some more experience.

I was just distracted with the username, too much going on. It's now specified in sysconfig/TWiki (I need it before creating the session). It's one username for all tasks.

Yes, my hope is that you (and TWiki) will integrate this - it's very isolated (2 files), and it is a plug-in (no pun intended) replacement for the tick script. It won't be very useful unless plugins can count on it being there - and at that point, I hope the plugins will start taking responsibility for their garbage collection. Note that the plugin in the prototype is just a demo/test scaffolding. It would not be released, but EmptyPlugin would get a subset as sample code.

Code: The latest code snapshot is attached to this topic. I'm still fussing with logging and error handling, and it will need more documentation.

I may add a variant of AddTask (probably spelled AddAsyncTask) that runs the task in its own fork. This would support resource-intensive tasks - but they'd have more potential synchronization issues. Standard AddTasks will continue to run single-threaded in the daemon - but still need to worry about synchronization issues wrt. the webserver.

That said, it's mostly there. (And quite a bit more complicated and function-rich than my original baseline.)

You're welcome to peruse, play with the prototype, & review the code. Just understand that it's still evolving. (Though that's a good time to make helpful comments. I do listen...)

I appreciate all the constructive comments and thoughts.

-- TimotheLitt - 25 May 2011

Posted V2.0-004, which has basic configure support, including pretty thorough validation of timespecs. The GUI isn't pretty, but it does seem to work. It would be nicer to at least have six text boxes - 1 for each subfield, but I didn't see how to do that. Crontab is inherently ugly.

This adds three new files to lib/TWiki/Configure/. You also need to apply a small patch to TWiki.spec; the patch file is included. The setting is in "Miscellaneous Settings".

Custom schedules for other tasks would add a couple of lines in TWiki.spec & clone the CleanupSchedule.pm file - just change the package name and the cfg key.

I've heard that Foswiki has a "new" configure architecture, but as I haven't looked at it, don't know how much work it will be to port. However, the timespec validator (about 200 lines of script!) is a separate routine - hopefully that's the hard part.

-- TimotheLitt - 26 May 2011

Please don't extend the API beyond Foswiki::Func (or TWiki::Func). This really does make it hard for new developers to learn. In my opinion, the API is already too broad and poorly defined, so please do not make it worse by adding another package to the API that plugins may use. I expect the "engine" code would not live in TWiki::Func; but please do add a wrapper in TWiki::Func. It makes the learning curve less steep smile

-- MichaelTempest - 26 May 2011

Seems reasonable. I don't want to touch (Foswiki|TWiki)::Func.pm, but the next drop will export the API functions into the TWiki::Func namespace as

TWiki::Func::AddTask, TWiki::Func::DeleteTask, TWiki::Func::NextRuntime, and TWiki::Func::ReplaceSchedule

They only exist when running under the task daemon, and will be documented there.

TWiki seems to want this for their next release, modulo the checkin mechanics. I hope to be "done" soon...

-- TimotheLitt - 26 May 2011

Progress. I have a much closer to industrial strength prototype running. I created a documentation page - it doesn't really belong in this topic, so I put it at PeriodicTasks. I hope no-one is offended - I couldn't find a better place. It has admin and developer documentation, including installation instructions (sigh, yes for TWiki) & screenshots. It's not intended as a discussion topic, but as a start on release documentation. Comments are welcome here.

It's worth a look.

Asynchronous task support is working, logging is working and there's a configure GUI that is better than crontab - at least, I think so. I had to break my rules and make a very small patch to configure to make it work, however.

There is no longer an /etc/sysconfig file - I came up with a scheme that eliminates that.

There is a mechanism for on-the-fly reconfiguration.

I had some problems with Schedule::Cron. I've included a patch in the latest .tar file, but it's not documented in my install instructions. The owner promised to review and release a new kit late this week. It's not a small patch - I fixed a year's worth of his RT backlog as well as my 4 bugs. frown, sad smile

I hope less effort will be required with the Foswiki new Configure.

Latest code snapshot is here (same place).

Enjoy,

-- TimotheLitt - 31 May 2011

Tried to stay out of this discussion up to now but got curious as soon as PeriodicTasks materialized. I like the direction this goes but would really like to see it mature into a more general async task manager usable not only for periodic tasks but also for those tasks that better stay out of the response code flow and run only once and immediately. Examples for these kind of tasks are

  • updating search indexes due to a saved/attached/removed/move event
  • updating search indexes by crawling external document sources
  • sending emails like group invitations
  • publishing document sets manually
Just to name a few that I can think of right now. Publishing document sets (by copying them to a different location) are normally good candidates for periodic tasks but may also happen on demand manually.

So maybe it would be good to rename this thing to a more general WikiTaskManager that handles jobs async'ly rather than restricting it to periodic tasks as a lot of use cases require adhoc task.

Conceptually such a task manager is designed around the notion of "jobs" and "queues". Haven't see this in the specs ad PeriodicTasks so far. A periodic task is a rather specific job description based on a reoccurring time event. That's different from "real time tasks" that should be executed asap.

Not sure if this has already been discussed above. /me reading up.

-- MichaelDaum - 31 May 2011

I've just finished looking over PeriodicTasks. I also like the direction (small quibble: can we have Foswiki::Func::registerTask instead of addTask :-). I'm less sure about the way it's implied to hand around state between different plugins or tasks/runs ($session). I also need to understand why su/gid bit is necessary, as it's not permitted on our webserver environment, and also why a symlink to bin is necessary (really this should be a script that's no harder to run than foswiki_tick.pl or a rest handler), but I guess that's minor implementation detail.

What counts is the API.

I am finishing up a project in a couple of months in which I will need this functionality. Two modes: run every N minutes (I already do this with a traditional cron job, to re-generate report topics & attachments). And secondly what I'm really wanting is a "single-shot" task launched async'ly (also blocking, avoiding pile-ups). I haven't thought through how to allow the plugin to manage a schedule that gets longer and longer, further and further behind (drop tasks? implement a producer/consumer message queue thingy?)...

Cool work smile

-- PaulHarvey - 31 May 2011

Let's see:

Michael -

Once and immediate tasks - that really wasn't my objective. I'd really need to think harder about the requirements. You may just want your own daemon; I could package up some of the infrastructure (like registering for config change notification); there are already tools like Proc::Daemon.

Paul,

register v.s. add - you know, I thought about that. But I add is so much less typing :-). Maybe - before there are lots of consumers. Are we that pedantic? smile

The setuid/setgid story:
  • As long as you never run under root, you don't need it. Things will work, life will be good.
  • If you run under root as startup scripts do, there's a problem.
    • You clearly need to switch to the webserver pid/gid.
    • How do you know what it is?
    • Well, the usual trick (and my first stab) is yet another config file
      • But then, people run multiple wiki versions. So besides the annoyance of having to specify which one, you maintain several. And you have to see about nobody, apache, webserver or who-knows what.
    • So, I wanted a scheme that uses the filesystem and didn't require a config file
      • Runing setuid/setgid, the script gets the numeric uid/gid of the webserver, switches and is done.
        • Note we are not setuid to root; we are switching away from root.
      • If you don't like this, wrap startup in your own script that switches first, or runs under the webserver. Just make tick_xwiki a cgi script and run it by a wget in your own startup script. By default, it will do a constart (start the daemon if it's not running). I think it's more hassle, but you can do that without changing any of my code...
    • Next, how do you find setlib.cfg if you're started in an unknown enviroment and don't want yet another config file?
      • The trick here is to do something based on the script that you're running.
        • I resolve $0 to a physical location (/etc/rc#.d/S??foo is always a symlink)
        • Now physical name + a suffix symlinks to the directory you want me to find setlib.cfg. And I'm done.
        • Well, almost. Now certain people come around and want to select one of several setlib.cfgs. Like developers who run wiki V1.1 and V.trunk smile
        • For that, we need two physical names. The hardlink gets us there with minimal code. And the effort is required only by those who need the feature.
  • The previous approach (of tick_twiki.pl, which this can be a drop-in replacement for) to finding setlib.cfg was a wrapper script that read cd /path-to-bin && ./tick_twiki.pl. That's a config file by another name - and it meant a separate script (small) for each wiki. And entering an editor is more work than creating a symlink. My scheme got rid of that. (Though it doesn't care if you run in that directory, it will still follow $0).
    • I suppose I could look to see if there's a ./setlib.cfg & bypass the search if it's there.
    • Not much advantage from my point of view, but backwards (in every sense of the word) compatible.
    • In the next drop.
Passing state around:
  • Well, it saves bulding and tearing down sessions - it's conceptually no different from mod_perl, except that you start with a pre-initialzed session. Pretty much everything one does needs one - or you're not using *wiki.
  • I reserved the right to pass a different session - e.g. if something dramatic happens, perhaps it'll be necessary to re-initialize. But since one can't reload plugins, I haven't found the use case yet.
  • It does seem a bit creepy at first blush, but it does grow on you.
One-shot synchronous jobs.
  • Again, not the problem I wanted to solve. But here's how you could leverage this daemon:
    • open a non-blocking listening socket in initPlugin.
    • Create a task in your plugin to run synchronously at a reasonably high rate. Pass the socket number as an argument.
    • When your task runs, select/poll your socket for requests. handle one (or a few)
    • The conventional side of your plugin just sends it's request
    • Don't forget authentication smile
  • This gets you a serialized stream of requests in a persistent environment, pretty much for free.
Much of this is well beyond my original scope - which was to provide a reliable mechanism for plugins to clean up their working areas.

-- TimotheLitt - 31 May 2011

Are we that pedantic - yes, I'm afraid we have to be, otherwise the API becomes really difficult to use for infrequent users. We established the meme of "register" meaning "make the system aware of this" - registerTagHandler, registerRESTHandler - and to branch off to add for the sake of 4 characters of typing is rather churlish. Note also that the coding standard for both Foswiki and TWiki requires lower-case first character function names.

A couple of notes; I see from reading the code example in PeriodicTasks that the plugin author is expected to know about the context variable Periodic_Tasks. Why? Why can't this check be done in Foswiki::Func::registerTask? One less thing for the plugin author to have to worry about.

Also, your example shows a task being added from a plugin, but doesn't say how you might add a task from a Contrib (which doesn't have an init function). This is a problem that affects other plugins that can themselves be extended, and is done by supporting registration through configure. For example, the JQueryPlugin lets you define $Foswiki::Cfg{Plugins}{JQueryPlugin}{Plugins} to be a set of modules that the JQueryPlugin is to load when it is started up. That lets you register a new jquery-plugin without having to implement a Foswiki-plugin. This is needed for - for example - MailerContrib, which doesn't have a plugin. The analogous task registration might be something like this:
$Foswiki::cfg{PeriodicTasks}{Mailer}{Function} = 'Foswiki::Contrib::MailerContrib::notify';
$Foswiki::cfg{PeriodicTasks}{Mailer}{Schedule} = '1 * * * 3';
$Foswiki::cfg{PeriodicTasks}{Mailer}{Arguments} = [ 1, 2, 3 ];
$Foswiki::cfg{PeriodicTasks}{CacheCleanup}{Function} = 'Foswiki::Plugins::CacheCleanupPlugin::pluginCleanup';
$Foswiki::cfg{PeriodicTasks}{CacheCleanup}{Schedule} = '1 * * * *';

If you adopt this approach you don't actually need any changes to the Foswiki::Fun API - the whole thing can be done via configure - though I confess I rather like your fine-grained task management through the Func API.

-- CrawfordCurrie - 03 Jun 2011

Thank you for your detailed answers. I understand about the symlinks - it just doesn't "feel" the same as the other scripts (for example, running rest script). But maybe I'm too close to it these days (I am running trunk in production). I understand the misapprehension admins (other than myself?) must feel about having to enumerate specific LIB paths just to fire something that should "just work".

I'm not saying it's a bad idea, it's just that so many arbitrary inconsistencies have been painfully removed, it would be a shame to add a new one. Which might mean that we find a solution that covers all the other scripts as well?

And I understand we're dragging out the scope of your original goals. That just means you're doing something right smile

-- PaulHarvey - 03 Jun 2011

Crawford,

I don't remember seeing a coding standard, but as I'm making other changes, I'll adapt. Pointer?

The reason that the context variable is required is that the API doesn't exist when running under the webserver (e.g. normally). The whole thing lives in what used to be tick_*wiki.pl, which materializes all this stuff before calling *wiki->new(). It's unconventional - but the idea was to avoid touching func.pm - and also, to not load the code for the services into the webserver environment. (Keep in mind that tasks generate wiki requests; unlike everything else that's oriented toward responding to them...)

So you can't call Add/Register/anything unless you know it's there...you'll die calling an undefined function. So there's no way to stub it out. Of course, if I patched *Func.pm, I could put stubroutines there - but that's just baggage for the webserver. And what would the caller do? There's nothing you can do with them, because the data isn't there. The calller doesn't want to check each call - either it's running scheduled, or it's running under the webserver. It's not a fine-grained choice. If it was just one "Add a task" call, it would be a wash. But as you'll see, probably it's a larger block of code.

And this really is a new context, so it felt reasonable. Given that EmptyPlugin will provide a template, the test will just be part of the formula for how you write a plugin.

I hadn't gotten to the Contrib problem - thanks for explaining the situation and for the concept. As a first pass, I now provide a Contrib loader,that runs after the normal wiki->new initialization, but before any task is scheduled. Your contrib would have a small interface to the task schedular. It can simply wrap your existing code, or take advantage of the other facilities.

You define what to load with these items - you can point to any module in @INC, but usually I'd expect it to be as shown:

{Periodic}{Contrib}{*}{Module} = 'Xwiki::Contrib::*::Tasks';

{Periodic}{Contrib}{*}{Version} = "3.0"; # Optional, minimum acceptable version

and it will require/import and optionally VERSION-check all listed {Module}s. I will eventually call each's initContrib with something like the same signature as initPlugin - I should be able to dig that out of the session - except maybe $installweb?

This gives the contrib a chance to initialize & decide what other part(s) of itself (if any) to load. And it can then decide what task(s) it wants to schedule. For example, MailerContrib may have multiple schedules - maybe different for news vs mail, maybe per-web. It would obtain those from the normal contrib's namespace. I don't see any point in replacing command line arguments in the framework.

I also provide a contribCleanup convenience call-if-there, so it would be exactly analogous to a plugin - except only loaded in the scheduled environment.

I put an example below - which loads.

Does this seem reasonable to you?

As for nested plugins - I'm inclined to say that if JQuery loads an extension, JQuery gets to pass on the call to its initPlugin. (However that's spelled.) And by extension check for and call, pluginCleanup from its own.

Paul,

Based on your previous comment, I will take a setlib.cfg from cwd, if there is one. If not, I'll follow the links. So for most of us, nothing changes. if you (cd bin && ../tools/tick_*wiki.pl), you'll get the setlib.cfg from there. If you (cd tools && ./tick_*wiki.pl) (where there's no setlib.cfg), I'll try to follow the links, thus supporting system startup of the (unusual) multi-version-wiki environment.

I've been beaten-up in the past for not thinking of the multiple-version-wiki environment. So far, the links are the best way I know to make that work for system startup... And other scripts can certainly do the same thing, including shell scripts - readlink -en is your friend in the shell. But if you have other ideas, I'm open.

And now for the other news.

I've been pondering all the feedback, and come to some conclusions.

First, I'm not developing a general queue or batch job management system. The world has enough of those. But *wiki does have a unique set of issues that do seem worth addressing. We are pure perl, we have a complex set of configurable plugins/addons. It's expensive to instantiate a session, so a persistent environment is desirable. We would like to have unified configuration and management of maintenance and some batch/off-line processing. Scheduled processing is one part of the problem, but there are other events that we want to trigger tasks. And we don't include maintenance processing as a first class construct - whether it's working area cleanup in plugins, the tick_*wiki stuff, or Contribs like Mailer.

I'm coming to think of this as an environment that processes a non-web source of wiki requests. The environment is different because under a webserver, you have to deal with time limits, users who navigate away, webserver restarts, and other external factors that raise havoc with maintenance activities. The environment I'm creating has a wiki session, but is stable and event-driven. Plus, it integrates maintenance coding into developing plugins/extensions rather than leaving it to ad-hoc cron scripts.

So, since I've said this is a prototype - I'm making some changes. Again.

Orthogonal to the synchronous/asynchronous (threaded vs. forked) task types, I'm implementing a triggering model. So in addition to a task being triggered by a cron-like schedule, it can also be triggered on anything that select (the system call, not the perl function) can wait on.

So, for example, you can register (and yes, I called the API registerFileHandles smile ) a callback for a listening socket. I do that internally, so one can get status from the command line (or a plugin). I expect I'll have a forking version too, as most sane people don't like to write non-blocking select-threaded code.

But, you can also register other events. One that I'm building in is inotify - makes watching for config file changes much more efficient and response more timely. And you can use that to monitor directories (e.g. under working/yourfaclity/) used for request queues. So your off-line PDF generation can watch for a request, have a thread forked in real-time & put it's output back in pub. Or whatever. I will fall-back to polled monitoring with stat() on systems that don't support inotify, though hopefully over time others will add their equivalents. I'm still thinking about the minimum semantics to make supporting multiple systems easy.

I am also thinking about an at (or after) a specific time trigger. Cron is great for expressing periodic schedules, but clueless about "do this on 4-jul-1853" or r"etry this once 30 seconds from now."

And so I expect I'll change spellings and signatures - but then, no one else has actually coded to this yet - that I know of.

I think this will provide enough infrastructure for others to build solutions to the issues raised in prior comments.

I suspect it will be a few days - some of this is tricky, and I have other stuff in my queue as well.

By the way, I've seen a taint issue in tick_twiki (4.2.3) but haven't investigated. Anyone been there (care to?) It would be helpful, as setuid forces taint mode...and I don't need the distraction of investigating...
| 03 Jun 2011 - 12:06 | (main) Periodic Task[15812](E): Schedule::Cron - Error within job 5: Insecure dependency in unlink while running with -T switch at /var/www/servers/twiki/lib/TWiki/Store/RcsFile.pm line 732.|

Here's what a contrib interface module looks like (Unsurprisingly similar to a plugin, I hope. Not the same to catch errors.):
package TWiki::Contrib::PeriodicTasks::MailerContrib;
# Always use strict to enforce variable scoping
use warnings;
use strict;
require TWiki::Func;    # The plugins API
use vars qw( $VERSION $RELEASE );
#$VERSION = '$Rev: 15942 (11 Aug 2008) $';
$VERSION = 1.1; # Checked by loader.
$RELEASE = 'V0.000-001';
our $contribName = 'MailerContrib';
sub initContrib {
    my( $topic, $web, $user, $installWeb ) = @_;
    TWiki::Func::writeDebug( "$contribName loaded" );
    unless( TWiki::Func::getContext()->{Periodic_Task} ) {
             die "Configuration error: " . __PACKAGE__ . "$contribName should never be initialzed by a webserver"
    }
    # Task definitions, reconfig handler, etc goes here.
    my $dummy = $TWiki::cfg{Contrib}{$contribName}{Useless};
    return 1;
}
# Task run on standard plugin/contrib cleanup schedule
#
# You need only define this subroutine for it to be called on the admin-defined schedule
# $TWiki::cfg{CleanupSchedule}
#
# For a simple contrib, this is all you need.  This sample code simply deletes old files
# in the working area.  The age is configured by a web preference or a config item.
#
# This name (contribCleanup) is required.
sub contribCleanup {
    my( $session, $now ) = @_;
    TWiki::Func::writeDebug( "$contribName: Running contribCleanup: $now" );
    my $wa = TWiki::Func::getWorkArea($contribName);
    # Maximum age for files before they are deleted.
    # Note that updating MaxAge in configure will be reflected here without any code in the contrib.
    my $maxage =  TWiki::Func::getPreferencesValue( "\U$contribName\E_MAXAGE" ) ||
                  $TWiki::cfg{Contrib}{$contribName}{MaxAge} || 24;
    my $oldest = $now - ($maxage*60*60);
    # One might want to select only certain files from the working area and/or log deletions.
    foreach my $wf ( glob( "$wa/*" ) ) {
        my( $uid, $gid, $mtime ) = (stat $wf)[4,5,9];


        if( $uid == $> && $gid == $)+0 && $mtime < $oldest) {

<font style="background-color: #f5f5f5;">            </font>$wf =~ /^(.*$)$/;               # Untaint so -T works

            $wf = $1;
            unlink $wf or TWiki::Func::writeWarning( "Unable to delete $wa: $!" );
        }
    }

    return 0;
}
1;

Finally just for fun, here's a status report generated from an network-triggered synchronous task -the daemon packages it up and sends it back to your command line.
tools/experimental_tick_twiki.pl -d status list
Daemon is running (16104)
Job queue (ordered by next scheduled execution time):
Job 0    */2 * * * * 34                                               Next: Fri Jun  3 12:24:34 2011 - TWiki::Plugins::PeriodicTestPlugin::cronTask1
Job 5    */2 * * * * 34                                               Next: Fri Jun  3 12:24:34 2011 - TWiki::Periodic::TickTock
Job 6    */1 * * * * 19                                               Next: Fri Jun  3 12:25:19 2011 - TWiki::Periodic::ReConfig
Job 3    */2 * * * * 23                                               Next: Fri Jun  3 12:26:23 2011 - TWiki::Periodic::Forker-1
Job 4    */2 * * * * 23                                               Next: Fri Jun  3 12:26:23 2011 - TWiki::Periodic::Forker-2
Job 2    18 20 * Jul-Sep Sun,Sat                                      Next: Sat Jul  2 20:18:00 2011 - TWiki::Plugins::PeriodicTestPlugin::News
Job 1    1,3,4,6-17 8,15-17 * Feb,Apr,May,Aug-Dec/2 Tue,Thu 1,7,14,23 Next: Tue Aug  2 08:01:01 2011 - TWiki::Periodic::Mail
End of job queue
Active asynchronous tasks:
  PID Started                 Name
16093 Fri Jun  3 12:24:23 2011 TWiki::Periodic::Forker-1
16094 Fri Jun  3 12:24:23 2011 TWiki::Periodic::Forker-2
End of active task list

OK, it's not that exciting, but I can be easily amused. Sometimes.

-- TimotheLitt - 03 Jun 2011

Very cool! Especially the inotify stuff, it is exciting. Thank you for entertaining the ever growing scope creep smile FWIW, I never had a problem running multiple wikis with the existing arrangements. My own scripts do require you to have an FOSWIK_LIBS envar set or run the cumbersome sudo -u www-data perl -wT -I /path/to/foswiki/lib mytick.pl, however

-- PaulHarvey - 04 Jun 2011

The coding standards are at FoswikiCodingConventions (which is in turn linked from DevelopersBible, where all the developer help is portaled).

w.r.t the API - OK, I understand. You are "monkey patching" the API during task runs. That feels wrong to me, because it means that the API is different depending on the runtime context - ouch! What was the rationale for not doing this stuff in your own namespace e.g. Foswiki::Tasks ? You would still need to consult the runtime context to determine if the API is available, but in terms of code separation and encapsulation I feel it would be cleaner.
  • Later - I see that MichaelTempest talked you into extending the Func API. This would make sense if the functionality was ultimately to be adopted into Func, but my gut still tels me what you have here is so significantly more than that, that it ought to stand in it's own package.
I'm confused already by initContrib. Contribs don't have an init step, because there is nothing to init them from. Plugins are registered at startup, by virtue of their entry in configure (and auto-discovery, though that's discouraged), but there is nothing analagous for Contribs. Are you advocating a general purpose init step for contribs? If not, if initContrib is specific to the task environment, then the name needs to be specific to the role - e.g. initPeriodicTasks.

I'm still struggling with the semantics of the cleanup step, especially now you have added this step to contribs. There has to be a clear definition of what cleanup actually means. There are a number of different points at which "cleanup" is appropriate; for example:
  1. End of request
  2. Session terminating (especially in a mod_perl/fcgid environment)
  3. User changing in session
There may well be more, but those are off the top of my head.

Regarding tick_twiki taint issues; as you know that module is trivial, and doesn't take any input other than what comes from data files (which may be tainted, of course). I have not seen any such issues with Foswiki, but there have been several thousand bugfixes in the core code since we forked, several of which involved taint issues, and it could be any one of those. Only by nailing down the issue to a reproducible testcase (and ideally reproducing it on Foswiki) could it be addressed.

I really like the sound of your triggering model. That's something I've wanted for the longest time smile

Keep up the good work!

-- CrawfordCurrie - 04 Jun 2011

Timothe, exciting times... smile

-- MichaelDaum - 04 Jun 2011

I've been scratching my head about a few things for the last few days. I have not been able to resolve these things in my head, so I figure I should mention them.

If plugins need an API to the task scheduler, then I do believe that Foswiki::Func should provide that API. However, I wonder how much of an API is needed.

As Crawford pointed out, contribs also have a use for periodic tasks and config-change-handler tasks, but contribs have no initialisation interface and so a Foswiki::Func API for adding tasks would not be useful to contribs. We could add an initialisation interface for contribs, but I think that should be the subject of a separate feature proposal. With contribs as they are today, I do not see how contribs could use the task scheduler.

I am also unclear about the usecase for replaceSchedule. When would that be used? Do we have a usecase for plugins (or contribs) to change their schedule on-the-fly? That could be powerful, but I suspect nasty surprises could lurk there. How will developers debug this, and provide support for it? Will configure be able to query the daemon about the current schedule?

Defining a task schedule via Foswiki::cfg sounds simpler and more attractive than using a run-time API for managing schedules, so I am delighted to see that PeriodicTasks now shows a configure interface. But... how does that mechanism for defining a schedule interact with addTask and replaceSchedule? I assume that addTask and replaceSchedule won't be modifying Foswiki::cfg...

In contrast, I do see the point of something like nextRuntime, which could be useful to regular web-server processing as well as the scheduled tasks.

I do take Crawford's point that perhaps the API should live in another package. I therefore suggest something like Foswiki::Func::taskScheduler which returns a reference to a task scheduler object (when executing from the daemon) or undef (when executing in a web-server environment). That object may provide the API. This would make the functionality discoverable to new or inexperienced developers (because it is accessible from Foswiki::Func), it would avoid dumping many functions into Foswiki::Func, and it would encapsulate the task API.

(Or - how about if Foswiki::Func::taskScheduler returns an reference to an object, and that object's class conforms to an interface (pure virtual base class). The API is defined in terms of that interface. The actual class might differ between daemon and webserver usage.)

This is good and exciting work, but some aspects still have me puzzled.

-- MichaelTempest - 06 Jun 2011

Michael correctly observes that the plugins API will have to tell configure if it changes something. In reality, the daemon, and the plugins API all have to tell eachother what's going on. Simplest approach is to kick the daemon in the head each time a change is made (though of course, plugins using the API and configure may still conflict).

-- CrawfordCurrie - 06 Jun 2011

Thanks for all the feedback. Sorry I've been off-line for a while.

I have a pretty good idea of what version 3 will look like - just need a few tuit's to get it consolidated. I will post something when the bits are there.

A couple of quick responses. I plan to rename this - it's no longer simply periodic, nor are the tasks only cleanup. Probably "Task Framework". I've been brow-beaten into a bigger project - but it seems useful and there's this other stuff I'm avoiding by working on it smile However, the degenerate case (I just want my plugin to delete old files once in a while) remains simple.

Think of the Tasks Environment as a new place under which the whole wiki code runs for specialized functions. A webserver "replacement" for these pesky maintenance tasks. But there's no user doing a GET or POST to trigger action. (Or to abort it at an inopportune time.) So, to run under this environment, contribs have to register themselves. That's how the environment knows they want service. I decided to use the familiar plugin model - but from a contrib's point of view, it's plugging out of wiki, and in to the framework. Current plugins, which aren't contribs can run in both environments.

For an example, let me pick on MailerContrib again. Today, a shell script (twiki_maintenance) is started by cron using a crontab schedule. That script sequentially runs (on sub-schedules) various tasks, including run_tick_twiki, runmailnewsnotify, runmailwebnotify, and runstatistics. One has to do it this way because MailerContrib in particular gets in trouble if it runs news and web concurrently. Each of these run scripts does a cd and runs the corresponding perl notify script. And that script is just a command line UI wrapper for the worker Contrib code, which is built as an object!

Under the Tasking Framework/environment, things are parallel - but simpler. MailerContrib registers with Configure at install time (pretty much as today - I want Configure to be the management interface for wiki, we don't need yet another - quite.) The registration causes the tasking daemon to load MailerContrib 's interface module. It doesn't know what MailerContrib wants, just that it needs to be activated. MailerContrib consults its config items (and an astrologer or anything else) and calls back requesting that something be called with the news argument list weekly and the webchanges argument list daily. It can defer loading most of its code until the first call - which may be in a private (asynch) fork. No crontab, no shell wrappers with magic -I switches, it just gets called. Mail generation may take a while on a large, busy web, so perhaps it asks for an asynch task that handles both. Or it takes out a lock. That's your design choice. It's somewhat less work. But all the scheduling and control is in one place for the administrator - under *configure.

I'm more than happy to remove the Foswiki::Func aliases. I'm already thinking along the lines of a more object-oriented interface - it will reduce the number of APIs names (by turning them into methods). But the abstractions are a bit different - we have something like a task which is activated by a trigger, that may be periodic, inotify, or something else.

The current schedule - and more - is available dynamically from the daemon. In fact, it has an embedded (very limited) webserver. So debuggers can connect directly and restart, suspend and resume tasks. And if you use the magic macro, it can be embedded in a wiki page. The command line mangement tools just talk to the webserver and get text. There's even a magic ability to click a button and start the daemon. (Yes, it starts itself, and no, you don't need shell access.)

The daemon notices that LocalSite.cfg has been modified (polling or inotify); it reloads it on-the-fly. Your tasks can register for notifications - by specific configuration item - and will be told. So the simplest model is that configure changes a schedule, your task is registerd for that config item, and calls replaceSchedule.

All this is running now. What I haven't gotten to is the non-periodic stuff (except inotify on the config file, which works well), changing the API yet again, and posting updated documentation and screenshots.

Kicking daemons in the head is OK for developers and environments where not much is happening. If all the folks who've stepped up and said they want to also run queues and indexing and other long operations actually use this, it will be a big deal to stop and start. The graceful restart waits for all async (forked) kids to finish - and that could be minutes or hours, and during that time, I don't pick new tasks to run so the pipeline drains. It's not hard to deal with dynamic changes if you code for that from the start - and this is a new thing, so there's no excuse not to.

Then again, nothing has to migrate until the owner has round (or octagonal) tuits. Cron is still there.

Some of this will be easier to deal with when you can actually touch it. I understand the "other wiki" problem. To that end, I'm trying to get a new VM running Foswiki (as well as TWiki) trunk. It hasn't been easy - I just wrote up some of the challenges. But since for now this VM is dedicated to this, I should be able to make it available to interested parties on the public network. The good news is that except for configure, it really knows very little of which wiki (or the wiki's internals) it's running over.

But it sure is far from the 25 lines of code that would have solved the "clean up your disk space" problem.

 

 

Server 9021 started Tue Jun 14 03:10:06 2011 Scheduling tasks: Next due Tue Jun 14 03:11:19 2011

All plugins initialized succesfully

Configured modules

Mailer V0.000-001 lib/TWiki/Contrib/PeriodicTasks/MailerContrib.pm

Job queue (ordered by next scheduled execution time):

Job Schedule Next Execution Task
0
*/2 * * * * 34
Tue Jun 14 03:10:34 2011
TWiki::Periodic::TickTock
1
*/1 * * * * 19
Tue Jun 14 03:11:19 2011
TWiki::Periodic::ReConfig

Task Status Server is running on www.example.net:1026

Active clients

ID Peer Started
8
192.168.148.24
Tue Jun 14 03:10:33 2011

 


I hope we can merge this (see the docco at PeriodicTasks ) into foswiki 2.0 - as it will allow us to do larger rename operations, and other long running tasks without timing out the server.

it would then also mean that we can further simplify what scripts get created smile

putting me&Crawford as devs to merge so that we can have a commitment date.

-- SvenDowideit - 07 Sep 2011

Removing myself again, as I didn't know I had been committed, and can't commit.

-- CrawfordCurrie - 25 Apr 2014

Changing to Parked. Needs a developer to adopt.

-- GeorgeClark - 19 Nov 2015

 
Topic revision: r37 - 19 Nov 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy