Cache miss stampedes
dormando at rydia.net
Sat Jul 28 09:04:39 UTC 2007
Brad: One day I'll use the totally sweet Gearman ;) I linked to your
post in the FAQ.
Steven: Turns out I'm not as dumb as I thought:
... my original FAQ entry opens (clearly, I hope) by saying what you
suggested, then wanders off into alternatives. Which was discussed
further on the mailing list.
So! Sorry for the back and forth :( Apparantly I forgot what I had just
written. You're right to question why alternatives even needed
discussion, but some of us have/had fairly awkward caching primitives
which leads to something like this.
Brad Fitzpatrick wrote:
> Late to this party, but I have to mention Gearman here.
> On a cache miss, instead of going to the database directly, issue a
> Gearman request with a "uniq" property, then the Gearman server will
> combine all the duplicate requests and only dispatch one worker. The
> worker than puts it in the cache before returning to the Gearman router
> (gearmand), and then gearmand multiplexes the result back to all waiting
> On Wed, 25 Jul 2007, dormando wrote:
>> So I'm up late adding more crap to the memcached FAQ, and I'm wondering
>> about a particular access pattern:
>> - Key A is hit very often (many times per second).
>> - Key A goes missing.
>> - Several dozen processes all get a cache miss on A at the same time,
>> then run SQL query/whatever, and try set or adding back into memcached.
>> Sometimes this can be destructive to a database, and can happen often if
>> the expire time on the data is low for some reason.
>> What approaches do folks typically use to deal with this more elegantly?
>> The better suggestion I've heard is to try to 'add' the key (or a
>> separate 'lock' key) back into memcached, and only doing the query if
>> you 'win' that lock. Everyone else microsleeps and retries a few times
>> before running the query.
>> Also in most of these cases you should really run a tiered cache, with
>> this type of data being stored in a local cache and in memcached.
>> This really isn't a common case, but sucks hard when it happens. In the
>> back of my mind I envision a different style 'get' command, which
>> defaults to a mutex operation on miss. So you'd do the special 'get',
>> and if you get a special return code that says it's a miss but you're
>> clear to update the data (which would release the lock?). Otherwise the
>> command could optionally return immediately, or hang (for a while) until
>> the data's been updated.
>> Just throwing out ideas. Thoughts?
More information about the memcached