ketama consistent hashing update

Steven Grimm sgrimm at facebook.com
Wed Aug 8 12:25:46 UTC 2007


Alex Stapleton wrote:
> As Steve says in his follow-up we also need to handle collisions while 
> creating the continuum. Simply incrementing the point value by 1 until 
> we find an empty place is going to give us a rather biased continuum. 
> I think that incrementing the point number 1 and recalculating the 
> point value from that should give a better distribution while being 
> fairly easy to actually implement.

I'm not a big fan of that idea because an implementer would have to be 
very careful to avoid insertion order bugs. You need the mapping from a 
key to a host to be the same whether the host list was initialized as 
(A,B,C,D) or as (A,B,D) with (C) added later, and any kind of "Oops, 
that bucket is taken already, do something else with the node I'm trying 
to insert" algorithm will, unless it's carefully implemented, 
potentially give you a subtly different data structure in those two cases.

I'm much more in favor of making "n" large enough that it's reasonable 
to say something like, "Oops, that bucket is taken already; overwrite it 
with me if the address of the node it currently points to is numerically 
less than mine, else don't insert myself here." Then you will 
deterministically end up with exactly the same search tree no matter 
what order the nodes are inserted. If you are inserting each node 10 
times in a tree and your search space is a full 32 bits, the occasional 
node that's inserted 9 times instead is going to be basically irrelevant 
to the health of the system. Not perfectly balanced, sure, but there 
will almost never be any collisions to begin with and it isn't worth 
optimizing such a rare edge case.

-Steve



More information about the memcached mailing list