ketama consistent hashing update

Wed Aug 8 10:47:01 UTC 2007

Brad Fitzpatrick wrote:
>    -- do we pre-compute 'n' points like my perl client?
>
>       (note that this can be done lazily too, only evaluating each of
>        1024 positions the first time they're accessed)
>   

I'd like to request that we standardize on a much bigger number than 
1024. If you have a ton of memcached instances, "n" doesn't have to be 
very high before a space of 1024 values starts to get pretty crowded, if 
not completely full.

Speaking of which, any spec should clarify what happens when there's a 
collision between two targets. I would prefer it to be deterministic 
based on the targets in question, rather than based on insertion order; 
then you are guaranteed to get the same results in a client which has 
added new targets incrementally over time as you do in one that started 
out with the complete list in some undefined order.

I am not completely convinced there's much value to limiting the space 
to some arbitrarily small range like 1024 in the first place. Given that 
this is on the client side (where the request rate is relatively low), a 
binary search or n-ary tree traversal is likely to be indistinguishable 
from an array lookup in performance terms given the likely depth of the 
tree, and if you have a full 32 bits worth of number space, collisions 
become very unlikely. Given that the array in question is just a cache 
of the results of searching some underlying data structure -- hence the 
fact that it can be populated lazily -- IMO it should have a 
demonstrable performance benefit to be worth the extra layer of complexity.

The complexity, BTW, is not just in searching. The array really is a 
cache, which means you have to be careful to invalidate it at the right 
times. When you add a new target in your lazy-evaluated implementation, 
are you quite certain the code will find it given that previous targets 
may already have been cached in the array? Et cetera.

-Steve