Cache miss stampedes
bradoaks at gmail.com
Wed Jul 25 17:12:02 UTC 2007
When we were replacing a shared memory cache with memcached I had to
come up with a way to keep the database from getting hammered when a
frequently requested value expires from the cache. This is my story:
Our system caches the frequently requested key for an unreasonably
long time (24hours). It also caches a "semaphore" key for the amount
of time that I want the main key's data to be refreshed (5 minutes).
When a process notices that the main key or the semaphore key no
longer exist in the cache, it checks to see if a third key exists --
this is the "updatelock" key.
If this updatelock key does not exist, the process will set the key.
The hostname and process id are stored in the value. This key is
stored with a small timeout in case the process hangs or dies after
staking a claim to the updatelock key. After attempting to set the
updatelock key, the process reads the updatelock cached key to make
sure it corresponds to its hostname and process id. If these do not
match, someone else is responsible for performing the update. Once
this is satisfied, the process performs the database lookup.
The process then sets the main key, sets a new semaphore key, and
deletes the updatelock key.
This scheme makes sure that only one of the processes is trying to hit
the database for this particulary heavily use key. If the main key
doesn't exist at all, but the updatelock is set by another process, I
just have the process sleep for a second and try again. This scenario
should only happen on a dump of the memcached pool, or if that
frequently requested key goes dormant for more than 24 hours.
We don't use this for our lower traffic keys, and only use this scheme
if the key will be accessed across an entire site on most requests.
On 7/25/07, Chris Hondl <chris at imvu.com> wrote:
> We've hit this scenario several times over the past year on different
> queries - typically slow queries (5-10 seconds) on frequently accessed web
> pages. We usually cache these for 6 hours or longer.
> We handle this with a probabilistic timeout implemented in our application.
> We store a timestamp and expiration time serialized as part of the value set
> to memcache. The last minute or so before the key expires we randomly run
> the query and set the value to memcache with a new expiry. The probability
> ramps up from 0% to 100% over the last minute or so.
> In two places we use a different algorithm. We set no expiry on the value,
> but on every get we have a 1 in N chance of refreshing the value. We tune N
> so that the value gets refreshed approximately once every 5 minutes.
> Annoying part about this is as our traffic grows we have to change N to keep
> the expected interval constant.
More information about the memcached