ketama consistent hashing update

a. a at enyim.com
Thu Aug 9 10:30:51 UTC 2007


Hi, check out this page http://bretm.home.comcast.net/hash/8.html for  
CRC-32

It seems that FNV hash or the modified version at http:// 
bretm.home.comcast.net/hash/6.html is much better for the  
requirements. (This is what I use in my client too.)



.



On Aug 9, 2007, at 12:20 PM, Richard Jones wrote:

> Multiple hash functions are required or you lose the benefit of  
> multiple
> points per server in the first place, for the reasons brad and russ
> mentioned.
>
> I'm going to give crc32 a go (for lookups and initial hashing) and  
> see how the
> distribution looks, it's probably cheaper, and makes sense as it is  
> more
> widely available. There was no particular reason why I used md5  
> over any
> other hash.
>
> 160 was chosen as the number of points per server because it showed  
> a decent
> distribution.. it's "big enough". Once the hashing and collision  
> handling are
> well defined, this could potentially be tuned per-install, and  
> still maintain
> compatability between clients. However, i think we can agree on a sane
> default with a bit more testing.
>
> I recently spotted a consistent hashing implementation in a branch  
> of php/pecl
> memcache cvs - http://pecl.php.net/package/memcache which iirc uses  
> crc32
> instead of md5. Need to investigate this some more.
>
> RJ
>
>
> On Wednesday 08 August 2007 13:03:37 Alex Stapleton wrote:
>> Brad Fitzpatrick wrote:
>>>    -- which digest algorithm(s) for the points and/or array lookup?
>>
>> There are a seemingly infinite array of hash functions out there  
>> but we
>> already have code for CRC32 on a vast number of platforms, is  
>> there any
>> particular reason – be it speed or quality – that we shouldn't  
>> continue
>> to use that?
>>
>>>    -- how many points are put on the continuum for each target?
>>>       +
>>>    -- what is the input to the digest algorithm for points 1-n?
>>>          e.g.:
>>>          "127.0.0.1:11211-1"
>>>          "127.0.0.1:11211-2"
>>>          "127.0.0.1:11211-3"
>>>                  ....
>>>          "127.0.0.1:11211-<n>"
>>
>> It seems like running multiple hashes over the string seems like a  
>> bit
>> of a waste of cycles. Modifying the hash value for each point  
>> would be
>> much more efficient and simpler to implement. Something like point  
>> value
>> = <hash value> + (<point number> * 2654435761)
>>
>> As Steve says in his follow-up we also need to handle collisions  
>> while
>> creating the continuum. Simply incrementing the point value by 1  
>> until
>> we find an empty place is going to give us a rather biased  
>> continuum. I
>> think that incrementing the point number 1 and recalculating the  
>> point
>> value from that should give a better distribution while being fairly
>> easy to actually implement.
>
>
>
> -- 
> Richard Jones
> Last.fm Ltd. | http://www.last.fm/
> Office: +44 (0) 207 780 7080



More information about the memcached mailing list