ketama consistent hashing update
a.
a at enyim.com
Thu Aug 9 10:30:51 UTC 2007
Hi, check out this page http://bretm.home.comcast.net/hash/8.html for
CRC-32
It seems that FNV hash or the modified version at http://
bretm.home.comcast.net/hash/6.html is much better for the
requirements. (This is what I use in my client too.)
.
On Aug 9, 2007, at 12:20 PM, Richard Jones wrote:
> Multiple hash functions are required or you lose the benefit of
> multiple
> points per server in the first place, for the reasons brad and russ
> mentioned.
>
> I'm going to give crc32 a go (for lookups and initial hashing) and
> see how the
> distribution looks, it's probably cheaper, and makes sense as it is
> more
> widely available. There was no particular reason why I used md5
> over any
> other hash.
>
> 160 was chosen as the number of points per server because it showed
> a decent
> distribution.. it's "big enough". Once the hashing and collision
> handling are
> well defined, this could potentially be tuned per-install, and
> still maintain
> compatability between clients. However, i think we can agree on a sane
> default with a bit more testing.
>
> I recently spotted a consistent hashing implementation in a branch
> of php/pecl
> memcache cvs - http://pecl.php.net/package/memcache which iirc uses
> crc32
> instead of md5. Need to investigate this some more.
>
> RJ
>
>
> On Wednesday 08 August 2007 13:03:37 Alex Stapleton wrote:
>> Brad Fitzpatrick wrote:
>>> -- which digest algorithm(s) for the points and/or array lookup?
>>
>> There are a seemingly infinite array of hash functions out there
>> but we
>> already have code for CRC32 on a vast number of platforms, is
>> there any
>> particular reason – be it speed or quality – that we shouldn't
>> continue
>> to use that?
>>
>>> -- how many points are put on the continuum for each target?
>>> +
>>> -- what is the input to the digest algorithm for points 1-n?
>>> e.g.:
>>> "127.0.0.1:11211-1"
>>> "127.0.0.1:11211-2"
>>> "127.0.0.1:11211-3"
>>> ....
>>> "127.0.0.1:11211-<n>"
>>
>> It seems like running multiple hashes over the string seems like a
>> bit
>> of a waste of cycles. Modifying the hash value for each point
>> would be
>> much more efficient and simpler to implement. Something like point
>> value
>> = <hash value> + (<point number> * 2654435761)
>>
>> As Steve says in his follow-up we also need to handle collisions
>> while
>> creating the continuum. Simply incrementing the point value by 1
>> until
>> we find an empty place is going to give us a rather biased
>> continuum. I
>> think that incrementing the point number 1 and recalculating the
>> point
>> value from that should give a better distribution while being fairly
>> easy to actually implement.
>
>
>
> --
> Richard Jones
> Last.fm Ltd. | http://www.last.fm/
> Office: +44 (0) 207 780 7080
More information about the memcached
mailing list