ketama - a consistent hashing algo for memcache clients
Timo Ewalds
timo at tzc.com
Wed Apr 11 04:18:15 UTC 2007
I'd like to point out that the problem of stale cache entries exists if
you add AND remove servers without flushing all the data. Here's the
sequence that would cause it:
1. Start a bunch of servers
2. Run for a while
3. Add 1 server
4. Run for a while
5. Remove the server added in step 3.
Now a key that mapped to one of the original servers, but moved to the
new server, maps to the old server again. If the key hadn't expired on
the old server, it'll be returned as if it was valid, though it may have
been changed/invalidated on the new server. To avoid this, you can never
remove servers after adding servers. Adding is fine, removing is fine,
adding fresh/flushed ones after removing is fine, but removing after
adding is not.
Note that this problem is very similar to the one of rehashing key
locations when a server goes down and comes back up without being
flushed (ie intermittent network issues). I believe the consensus to fix
it is just to not rehash and deal with the extra db load from having a
downed memcache server.
As a separate issue, presumably if you're adding servers often, you've
got many servers. If you're adding 100 buckets per server (to make the
distribution even), what's the cpu usage like to figure out which server
to choose, compared to the simple hash method. I'd guess in php that
could be quite a bit heavier where the bucket list needs to be rebuilt
each page.
Has anyone tried using consistent hashing not to make server management
easier, but to increase locality of keys to make get_multi's more likely
to use fewer servers? Something like have the top 20 bits be a hash of
the prefix, and the lower 12 bits be a hash of the suffix (id?). Using
consistent hashing, it means keys with the same prefix will be nearby in
the number line, so likely to map to the same server.
Another solution which we already use for a specific case would be to
have different server pools for different data groups (I believe Steven
Grimm mentioned doing this), but it is annoying to do in the general case.
Timo
Richard Jones wrote:
> Every time we added memcached servers to the pool, the rehashing of all keys
> would kill our database while the cache filled up again. We implemented a
> consistent hashing algorithm that our memcached clients are now using to map
> keys to servers. Now we can add new memcached servers without causing
> chaos :)
>
> Here is the blurb and the code:
> http://www.last.fm/user/RJ/journal/2007/04/10/392555/
>
> Source download includes:
> libketama, a c library
> php_ketama, a php extension that uses the library
> a pure java implementation
>
> I'm interested in other approaches (or even if this is a concern for most
> people?). I read about the lots-of-virtual-buckets concept, but didn't find
> an implementation. Anyone already using consistent hashing for memcache?
>
> RJ
>
>
More information about the memcached
mailing list