Cache miss stampedes

Sat Jul 28 09:04:39 UTC 2007

Brad: One day I'll use the totally sweet Gearman ;) I linked to your 
post in the FAQ.

Steven: Turns out I'm not as dumb as I thought: 
http://www.socialtext.net/memcached/index.cgi?faq#how_to_prevent_clobbering_updates_stampeding_requests

... my original FAQ entry opens (clearly, I hope) by saying what you 
suggested, then wanders off into alternatives. Which was discussed 
further on the mailing list.

So! Sorry for the back and forth :( Apparantly I forgot what I had just 
written. You're right to question why alternatives even needed 
discussion, but some of us have/had fairly awkward caching primitives 
which leads to something like this.

-Dormando

Brad Fitzpatrick wrote:
> Late to this party, but I have to mention Gearman here.
> 
> On a cache miss, instead of going to the database directly, issue a
> Gearman request with a "uniq" property, then the Gearman server will
> combine all the duplicate requests and only dispatch one worker.  The
> worker than puts it in the cache before returning to the Gearman router
> (gearmand), and then gearmand multiplexes the result back to all waiting
> callers.
> 
> 
> On Wed, 25 Jul 2007, dormando wrote:
> 
>> Hey,
>>
>> So I'm up late adding more crap to the memcached FAQ, and I'm wondering
>> about a particular access pattern:
>>
>> - Key A is hit very often (many times per second).
>> - Key A goes missing.
>> - Several dozen processes all get a cache miss on A at the same time,
>> then run SQL query/whatever, and try set or adding back into memcached.
>>
>> Sometimes this can be destructive to a database, and can happen often if
>> the expire time on the data is low for some reason.
>>
>> What approaches do folks typically use to deal with this more elegantly?
>> The better suggestion I've heard is to try to 'add' the key (or a
>> separate 'lock' key) back into memcached, and only doing the query if
>> you 'win' that lock. Everyone else microsleeps and retries a few times
>> before running the query.
>>
>> Also in most of these cases you should really run a tiered cache, with
>> this type of data being stored in a local cache and in memcached.
>>
>> This really isn't a common case, but sucks hard when it happens. In the
>> back of my mind I envision a different style 'get' command, which
>> defaults to a mutex operation on miss. So you'd do the special 'get',
>> and if you get a special return code that says it's a miss but you're
>> clear to update the data (which would release the lock?). Otherwise the
>> command could optionally return immediately, or hang (for a while) until
>> the data's been updated.
>>
>> Just throwing out ideas. Thoughts?
>>
>> -Dormando
>>
>>