Why not let the fs handle this ??

Steven Grimm sgrimm at facebook.com
Wed Jun 7 16:59:24 UTC 2006


Jon Drukman wrote:
> Has this actually worked out well in practice for anybody?  I've found 
> that losing one machine (out of about 100) results in so much db 
> thrashing as the keys get repopulated into different places that the 
> site becomes basically unusable until enough of the cache has been 
> regenerated (5-10 minutes if i'm lucky).

The problem there is not with lots of memcached servers, but that you're 
letting your client relocate your keys when a server goes down. Our 
approach is just to let the missing machine cause cache misses, but the 
others continue to serve up their usual data. (A missing memcached never 
stays missing for long since our operations team monitors the servers 
and fixes any broken ones pretty quickly.)

The 1.1.13 version of memcached looks like it has some new features to 
deal with server loss without relocating all the keys while still giving 
all the keys a place to live, so that'll definitely be an improvement. 
But you can get a reasonable failure mode right now by just controlling 
what the client does when a server is down. Some of the memcached 
clients have an option you can set to control that behavior; others 
you'll have to hack.

-Steve


More information about the memcached mailing list