ketama consistent hashing update

Thu Aug 9 11:16:33 UTC 2007

Brad Fitzpatrick wrote:
> On Wed, 8 Aug 2007, Alex Stapleton wrote:
> 
>> Brad Fitzpatrick wrote:
> 
>>>    -- what is the input to the digest algorithm for points 1-n?
>>>          e.g.:
>>>          "127.0.0.1:11211-1"
>>>          "127.0.0.1:11211-2"
>>>          "127.0.0.1:11211-3"
>>>                  ....
>>>          "127.0.0.1:11211-<n>"
>> It seems like running multiple hashes over the string seems like a bit
>> of a waste of cycles. Modifying the hash value for each point would be
>> much more efficient and simpler to implement. Something like point value
>> = <hash value> + (<point number> * 2654435761)
> 
> Alex,
> 
> I don't think that works well.  Consider:
> 
>      item A with weight 1.
>      item B with weight 1.
> 
> When hashing item A onto the 32-bit continuum you end up with, say, 10.
> When hashing item B onto the 32-bit continuum you end up with, say, 11.
> 
> Now, because of this one unlucky distribution where A only has 1 unit of
> space between it and B, you're not saved by the multiple points.
> 
> Now item A's second point is at 2654435761+10, and item B is at
> 2654435761+11, etc...

Yes thats a good point. I have a method that doesn't have that issue 
however I can't really think of an implementation that doesn't rely on 
integer overflow semantics which could be a problem if trying to do 
implementations in some languages, e.g. Perl, Python, PHP.

If we want to make life easy for people implementing clients in 
languages which don't have a simple way to use C style integers then 
perhaps doing what you initially suggested is the best approach after 
all my whining? :)