sgrimm at facebook.com
Tue Oct 24 08:06:08 UTC 2006
Moritz Maisel wrote:
> I think there is another aspect making me want to have synced servers:
> It would avoid race conditions with servers goining up and down.
I think rehashing when you have a server down is widely considered to be
a bad idea nowadays exactly because of the cache consistency problem.
The 1.2.0 branch has the start of a comprehensive solution ("managed"
mode, which splits the hash space up into a large but finite number of
buckets that get doled out to servers; search the list archives for
Brad's description of it) but in the meantime, I can tell you that at
Facebook we just treat requests to crashed servers as cache misses.
Of course, that strategy only works well if you have enough servers that
losing one of them doesn't cause your database load to skyrocket to
unacceptably high levels.
Another strategy we've considered, and may still use down the road, is
to access memcached via virtual IP addresses with something like
Linux-HA to manage them automatically. When a server goes down, another
server takes over its service address. When the original server comes
back up, it does not get its address back, so the fact that it has stale
data doesn't matter. (Or it can have its address back after the HA
manager resets its memcached instance, making the stale data problem moot.)
If you really want replication, in my opinion the right approach is to
tweak the client to send its write requests to multiple servers, say the
server indicated by the key hash and the one after that in the list.
That is a pretty trivial change that does not affect the server code in
the least. (Just remember that you need to do incr, decr, delete, and
flush_all along with set/add/replace!) The architecture of memcached
really revolves around having relatively smart clients talking to a
relatively dumb server; that's a lot of what makes it as fast as it is.
More information about the memcached