Memcached Clusters?

Tue Oct 24 08:06:08 UTC 2006

Moritz Maisel wrote:
> I think there is another aspect making me want to have synced servers: 
> It would avoid race conditions with servers goining up and down. 
>   

I think rehashing when you have a server down is widely considered to be 
a bad idea nowadays exactly because of the cache consistency problem. 
The 1.2.0 branch has the start of a comprehensive solution ("managed" 
mode, which splits the hash space up into a large but finite number of 
buckets that get doled out to servers; search the list archives for 
Brad's description of it) but in the meantime, I can tell you that at 
Facebook we just treat requests to crashed servers as cache misses.

Of course, that strategy only works well if you have enough servers that 
losing one of them doesn't cause your database load to skyrocket to 
unacceptably high levels.

Another strategy we've considered, and may still use down the road, is 
to access memcached via virtual IP addresses with something like 
Linux-HA to manage them automatically. When a server goes down, another 
server takes over its service address. When the original server comes 
back up, it does not get its address back, so the fact that it has stale 
data doesn't matter. (Or it can have its address back after the HA 
manager resets its memcached instance, making the stale data problem moot.)

If you really want replication, in my opinion the right approach is to 
tweak the client to send its write requests to multiple servers, say the 
server indicated by the key hash and the one after that in the list. 
That is a pretty trivial change that does not affect the server code in 
the least. (Just remember that you need to do incr, decr, delete, and 
flush_all along with set/add/replace!) The architecture of memcached 
really revolves around having relatively smart clients talking to a 
relatively dumb server; that's a lot of what makes it as fast as it is.

-Steve