Cache miss stampedes

Wed Jul 25 10:01:50 UTC 2007

Hey,

So I'm up late adding more crap to the memcached FAQ, and I'm wondering 
about a particular access pattern:

- Key A is hit very often (many times per second).
- Key A goes missing.
- Several dozen processes all get a cache miss on A at the same time, 
then run SQL query/whatever, and try set or adding back into memcached.

Sometimes this can be destructive to a database, and can happen often if 
the expire time on the data is low for some reason.

What approaches do folks typically use to deal with this more elegantly? 
The better suggestion I've heard is to try to 'add' the key (or a 
separate 'lock' key) back into memcached, and only doing the query if 
you 'win' that lock. Everyone else microsleeps and retries a few times 
before running the query.

Also in most of these cases you should really run a tiered cache, with 
this type of data being stored in a local cache and in memcached.

This really isn't a common case, but sucks hard when it happens. In the 
back of my mind I envision a different style 'get' command, which 
defaults to a mutex operation on miss. So you'd do the special 'get', 
and if you get a special return code that says it's a miss but you're 
clear to update the data (which would release the lock?). Otherwise the 
command could optionally return immediately, or hang (for a while) until 
the data's been updated.

Just throwing out ideas. Thoughts?

-Dormando