ketama consistent hashing update
Alex Stapleton
alexs at advfn.com
Thu Aug 9 11:16:33 UTC 2007
Brad Fitzpatrick wrote:
> On Wed, 8 Aug 2007, Alex Stapleton wrote:
>
>> Brad Fitzpatrick wrote:
>
>>> -- what is the input to the digest algorithm for points 1-n?
>>> e.g.:
>>> "127.0.0.1:11211-1"
>>> "127.0.0.1:11211-2"
>>> "127.0.0.1:11211-3"
>>> ....
>>> "127.0.0.1:11211-<n>"
>> It seems like running multiple hashes over the string seems like a bit
>> of a waste of cycles. Modifying the hash value for each point would be
>> much more efficient and simpler to implement. Something like point value
>> = <hash value> + (<point number> * 2654435761)
>
> Alex,
>
> I don't think that works well. Consider:
>
> item A with weight 1.
> item B with weight 1.
>
> When hashing item A onto the 32-bit continuum you end up with, say, 10.
> When hashing item B onto the 32-bit continuum you end up with, say, 11.
>
> Now, because of this one unlucky distribution where A only has 1 unit of
> space between it and B, you're not saved by the multiple points.
>
> Now item A's second point is at 2654435761+10, and item B is at
> 2654435761+11, etc...
Yes thats a good point. I have a method that doesn't have that issue
however I can't really think of an implementation that doesn't rely on
integer overflow semantics which could be a problem if trying to do
implementations in some languages, e.g. Perl, Python, PHP.
If we want to make life easy for people implementing clients in
languages which don't have a simple way to use C style integers then
perhaps doing what you initially suggested is the best approach after
all my whining? :)
More information about the memcached
mailing list