Item12965: Cache misses can create deadlocks if there is a collision
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: PageCache
Branches: master
When a single user account is shared among a large number of users, such as
WikiGuest, and caching is turned on, deadlocks are created when the server is under load.
To force this behavior, set
PageCache::getPage to always return undef, then put the server under load. I used JMeter to repeatedly load the same page until the error occurred, 10 times per second. It still took a while to show up.
I'm presuming this is caused by two
WikiGuests getting cache misses on the same page at the same time, forcing two renders. There could be other causes.
Turning caching on for a wiki for which there was formerly no caching creates a flurry of cache misses that could make this problem more prevalent than in a mature cached wiki.
--
FredTarbox - 15 Jul 2014
Could you please provide some details about your cache configuration as well as the kind of page you are testing against, just to make sure there's no other layer causing a deadlock.
Could you please enable
TRACE
in
Foswiki::PageCache
and
Foswiki::PageCache::DBI
and provide the logs here? Thanks.
--
MichaelDaum - 16 Jul 2014
The logs hit 15 MB before the first deadlock showed up, even though that only took about 30 seconds. Also, they're too detailed for me to scrub all private info from them. Can we arrange an email? I'll try to hang out in the IRC room.
--
FredTarbox - 16 Jul 2014
Settings:
$Foswiki::cfg{Cache}{Enabled} = 1;
$Foswiki::cfg{Cache}{WebDependencies} = 'WebRss, WebAtom, WebTopicList, WebIndex, WebSearch, WebSearchAdvanced';
$Foswiki::cfg{Cache}{DependencyFilter} = '$Foswiki::cfg{SystemWebName}\\..*|$Foswiki::cfg{TrashWebName}\\..*|.*Template$|TWiki\\..*';
$Foswiki::cfg{CacheManager} = 'Foswiki::Cache::FileCache';
$Foswiki::cfg{MetaCacheManager} = 'Foswiki::Cache::DB_File';
$Foswiki::cfg{Cache}{RootDir} = '$Foswiki::cfg{WorkingDir}/tmp/cache';
$Foswiki::cfg{Cache}{DBFile} = '$Foswiki::cfg{WorkingDir}/tmp/foswiki_db';
$Foswiki::cfg{Cache}{NameSpace} = '$Foswiki::cfg{DefaultUrlHost}';
$Foswiki::cfg{Cache}{MaxSize} = 1000;
$Foswiki::cfg{Cache}{Servers} = '127.0.0.1:11211';
$Foswiki::cfg{Cache}{Debug} = 0;
$Foswiki::cfg{Cache}{Implementation} = 'Foswiki::PageCache::DBI::MySQL';
$Foswiki::cfg{Cache}{DBI}{TablePrefix} = 'foswiki_cache';
$Foswiki::cfg{Cache}{DBI}{DSN} = '';
$Foswiki::cfg{Cache}{DBI}{Username} = '';
$Foswiki::cfg{Cache}{DBI}{Password} = MIME::Base64::decode_base64('');
$Foswiki::cfg{Cache}{DBI}{SQLite}{Filename} = '$Foswiki::cfg{WorkingDir}/sqlite.db';
$Foswiki::cfg{Cache}{DBI}{MySQL}{Host} = 'localhost';
$Foswiki::cfg{Cache}{DBI}{MySQL}{Port} = '3306';
$Foswiki::cfg{Cache}{DBI}{MySQL}{Database} = 'Foswiki';
$Foswiki::cfg{Cache}{DBI}{MySQL}{Username} = 'foswiki';
$Foswiki::cfg{Cache}{DBI}{MySQL}{Password} = MIME::Base64::decode_base64('scrubbed');
$Foswiki::cfg{Cache}{DBI}{PostgreSQL}{Port} = '';
$Foswiki::cfg{Cache}{DBI}{PostgreSQL}{Database} = '';
$Foswiki::cfg{Cache}{DBI}{PostgreSQL}{Username} = '';
$Foswiki::cfg{Cache}{DBI}{PostgreSQL}{Password} = MIME::Base64::decode_base64('');
--
FredTarbox - 16 Jul 2014
I also had similar troubles with sqlite, although it's not as good about producing errors for them. My mysql setup is the default.
--
FredTarbox - 16 Jul 2014
Did you test a vanilla Foswiki or are there any additional plugins installed?
Which page did you use for load-testing?
Also: please specify the way you've set up your HTTP server:
- Which one are you using?
- CGI or FCGI
- number of parallel Foswiki backends in case you are using FCGI
You mentioned on IRC that you changed
getPage
to always return
undef
. Have you got a patch of your local changes to Foswiki::PageCache:*?
Btw, I've been running 5 parallel users against a Foswiki storing cache meta in sqlite for a while now without problems.
For load-testing the page-cache with all parallel users generating new cache data on the same page all the time I've applied this patch:
--- lib/Foswiki/PageCache/DBI.pm (revision 17816)
+++ lib/Foswiki/PageCache/DBI.pm (working copy)
@@ -220,6 +220,9 @@
$variation->{data} = '' unless defined $variation->{data};
}
+### load testing
return undef;
+### load testing
return $variation;
}
Also no problems using sqlite. Will try mysql later as I can't install DBD::mysql that easily in my plenv env on the test laptop.
--
MichaelDaum - 17 Jul 2014
I'm really sorry, my company drafted me for some extracurricular stuff. I'll try to wrap up all this stuff for you on Sunday.
--
FredTarbox - 17 Jul 2014
Loaded and enabled plugins and contribs:
AttachContentPlugin
AutoCompletePlugin
AutoTemplatePlugin
BreadCrumbsPlugin
ClassificationPlugin
CommentPlugin
CompareRevisionsAddOn
DBCacheContrib
DBCachePlugin
EditRowPlugin
EditSyntaxPlugin
FamFamFamContrib
FastCGIEngineContrib
FilterPlugin
FlexFormPlugin
FlexWebListPlugin
HistoryPlugin
HomePagePlugin
ImageGalleryPlugin
ImagePlugin
InfiniteScrollContrib
InterwikiPlugin
JQGridPlugin
JQMomentContrib
JQSerialPagerContrib
JQTwistyContrib
JQueryPlugin
JSCalendarContrib
LdapContrib
MailerContrib
MediaWikiTablePlugin
MediaWikiToFoswikiContrib
MimeIconPlugin
NatEditPlugin
NatSkin
NatSkinPlugin
NewUserPlugin
PatchFoswikiContrib
PatchItem11983Contrib
PatchItem12285Contrib
PatternSkin
PlainFileStoreContrib
PreferencesPlugin
PublishPlugin
RedDotPlugin
RenderListPlugin
RenderPlugin
SetVariablePlugin
SlideShowPlugin
SmiliesPlugin
SolrPlugin
SpreadSheetPlugin
StringifierContrib
SubscribePlugin
TablePlugin
TagCloudPlugin
TinyMCEPlugin
TipsContrib
TopicInteractionPlugin
TopicUserMappingContrib
TwistyPlugin
UpdateAttachmentsPlugin
UpdateInfoPlugin
WebLinkPlugin
WikiWorkbenchContrib
WysiwygPlugin
YahooGridsAddOn
YahooUserInterfaceContrib
Server: apache 2.2.22
using mod_fcgid
Locked at 6 processes. (min and max)
I'm using Internal/IntentionallyBlank as my target page, which is (surprise) intentionally blank. I'm unable to duplicate the deadlocks on any page in System, including System/TwistyPlugin which looked promising because of the large number of %INCLUDEs in it. So, um... either System is excepted in some way, or maybe it has something to do with my
WebMenu and/or
WebLinks? I'll take direction on that.
We do have a rest call built into our UI that routes to a plugin which in turn taps our main sql database for the tiniest bit of "live" info useful to our users, no more than 10-20 characters max. But that's being called on the System pages in addition to the rest of the wiki.
The change to return undef is my only change to pagecache, and I only made the change because I could not recreate the error without it. It was a stab in the dark, it's possible the error it creates isn't my original error, but it at least produces the same error message:
Foswiki detected an internal error - please check your Foswiki logs and webserver logs for more information.
DBD::mysql::st execute failed: Deadlock found when trying to get lock; try restarting transaction
The logs reveal
stderr: DBD::mysql::st execute failed: Deadlock found when trying to get lock; try restarting transaction at /var/www/foswiki.telenetwork.com/lib/Foswiki/PageCache/DBI.pm line 368.
But the surrounding log entries are malformed, so I'm not sure what's causing it.
(Finally showed up again)
--
FredTarbox - 20 Jul 2014
Can you repro the deadlock while
all extras are disabled, i.e. stuff also interacting with the DB?
--
MichaelDaum - 21 Jul 2014
I disabled dbcacheplugin and turned off ldap based security, then disabled all plugins and could not reproduce the deadlock. I re-enabled
NatSkinPlugin (the page was rather silly looking without it) and nothing else, in particular not jquery plugin, and the deadlock returned, despite the page being pretty ugly. Without jquery plugin, none of my ui changes are in effect, they're just meaningless text now.
--
FredTarbox - 21 Jul 2014
You mentioned some DB related customizations based on
NatSkin, right? Try to disable them while still using
NatSkinPlugin.
--
MichaelDaum - 23 Jul 2014
No, I've just checked. No changes to
NatSkin's code. However! I did make a change to automatic linking. It only links to a topic if you have view rights to that topic. This makes
NatSkin loading
WebMenu fire off a huge number of security checks when rendering a page, since
WebMenu is nothing but links. When I disable the view rights check, I don't get deadlocks. Or at least, I didn't after the 60 seconds I let it run. Might've gotten lucky.
--
FredTarbox - 28 Jul 2014
Fred, as long as you are narrowing down and given the fact that I wasn't able to repro the error, I'll have to lower the prio of this bug report from Urgent to Normal. Please feel free to up the prio again as soon as you are sure where the error stems from in your setup. Thanks again for excessively testing the upcoming Foswiki code. This is much appreciated.
--
MichaelDaum - 29 Jul 2014
With the security check I added disabled and a page that has 95 distinct tests of view rights like so:
%IF{"'SomeWeb.SomePage' allows 'view'" then="YES!" else="NO!"}%
I get deadlocks. That's 95 different pages, not the same page 95 times, just to be clear.
--
FredTarbox - 29 Jul 2014
Could you please attach your
LocalSite.cfg
(passwds removed)?
--
MichaelDaum - 29 Jul 2014
Attached.
--
FredTarbox - 29 Jul 2014
Okay, I've been reading it top down. Some changes:
- remove rest from
{AuthScripts}
- switch back
{Store}{Implementation}
to RcsWrap
- switch
{PasswordManager}
back to HtPasswdUser
... your user mapper isn't ldap'ed yet anyway
--
MichaelDaum - 30 Jul 2014
Done. I've also disabled
all plugins this time, and now I've gotten a deadlock on a blank page:
%JQREQUIRE{"superfish, ui, blockui, farscroll, scrollto, imagetooltip, autosuggest, ui::tooltip, cookie, fontawesome" warn="off"}% %WEBCOMPONENT{"HtmlHead"}%
%USERACTIONS{ guest="
$login$sep$register
" format="
$account$sep$logout
" }%
%NATWEBLOGO{height="50"}%
%WEBCOMPONENT{"SideBar"}%
%WEBCOMPONENT{"TopicActions"}%
You are here: %BREADCRUMBS{separator=" » " newtopicformat="$topic"}%
%WEBCOMPONENT{"BottomBar"}%
This page was cached on 30 Jul 2014 - 09:09. Get a fresh version here.
^
%WEBCOMPONENT{"HtmlBottom"}%
Nothing resolves because all the plugins are disabled.
On the bright side, this means that I have an error log that's free of confidential information, attached.
--
FredTarbox - 30 Jul 2014
Error confirmed.
Here's a
patch.
Please try.
--
MichaelDaum - 05 Aug 2014