ketama - a consistent hashing algo for memcache clients
Pierre Demartines
pierre at demartines.com
Tue Apr 10 23:20:07 UTC 2007
FYI, we found the Fowler-Noll-Vo hashes to be very quick to compute, and
yet having beautiful properties in terms of dispersion.
FWIW, here is a simple Java implementation that computes a 64-bit
hashcode for a class that extends CharSequence:
package com.yourco.util;
/**
* Enables a CharSequence with the capability to produce a
Fowler-Noll-Vo (FNV) 64-bit hashcode.<p>
*
* FNV hashes are designed to be fast while maintaining a low collision
rate.
* The FNV speed allows one to quickly hash lots of data while
maintaining a
* reasonable collision rate. The high dispersion of the FNV hashes
makes them
* well suited for hashing nearly identical strings such as URLs,
hostnames,
* filenames, text, IP addresses, etc.
*
* Usage: add "extends CharSequenceHash64" to a class that implements
CharSequence,
* and that class will have hashCode64().<p>
*
* You can daisy-chain several hashCode calculations like this:<p>
* <pre>long hashOfSeveral =
a.hashCode64(b.hashCode64(c.hashCode64()));</pre><p>
* or, alternatively, like that:
* <pre>
* long hashOfSeveral = CharSequenceHash64.FNV1_64_INIT;
* for (MyCharSequence x : collectionOfMyCharSequences) {
* hashOfSeveral = x.hashCode64(hashOfSeveral);
* }</pre><p>
* @see http://www.isthe.com/chongo/tech/comp/fnv/ and
* @see http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash<p>
*
* @author pdemartines at quantcast.com
*/
public abstract class CharSequenceHash64 implements CharSequence {
public static final long FNV1_64_INIT = 0xcbf29ce484222325L;
private static final long FNV_64_PRIME = 0x100000001b3L;
/**
* @param initialValue -- optional: initial value
* @return a 64-bit FNV hash called fnv164
*/
public long hashCode64(long initialValue) {
long hval = initialValue;
int len = length();
for (int i=0; i<len; i++) {
hval *= FNV_64_PRIME;
hval ^= (long)charAt(i);
}
return hval;
}
public long hashCode64() {
return hashCode64(FNV1_64_INIT);
}
}
Here is a junit test for it, that shows how it can be used:
package com.yourco.util;
import java.util.HashMap;
import junit.framework.TestCase;
public class CharSequenceHash64_T extends TestCase {
public class MyCharSeq extends CharSequenceHash64 implements
CharSequence {
private char[] buffer = null;
private MyCharSeq() {}
private MyCharSeq(int length) {
buffer = new char[length];
}
public MyCharSeq(String s) {
buffer = s.toCharArray();
}
public int length() {
return (buffer == null ? 0 : buffer.length);
}
public char charAt(int index) {
if (index < length()) {
return buffer[index];
} else {
throw new IndexOutOfBoundsException();
}
}
public String toString() {
if (buffer == null) return "";
return new String(buffer);
}
public CharSequence subSequence(int start, int end) {
if (buffer == null) throw new IndexOutOfBoundsException();
int length = end-start;
MyCharSeq sub = new MyCharSeq(length);
System.arraycopy(buffer, start, sub.buffer, 0, end-start);
return sub;
}
}
// FNV's hashCode64
// Compare this function's value against known values (calculated
using bona-fide implementations of FNV164)
public void testHashCode64() {
HashMap<String,Long> value = new HashMap<String,Long>();
value.put("", new Long(0xcbf29ce484222325L));
value.put(" ", new Long(0xaf63bd4c8601b7ffL));
value.put("hello world!", new Long(0x58735284b97b86bcL));
value.put("Lorem ipsum dolor sit amet, consectetuer adipiscing
elit.", new Long(0x536c9cdee87c054aL));
value.put("wd:com.google", new Long(0xcf4e7986071b08f8L));
value.put("wd:com.google ", new Long(0x5d6176be12f03d48L));
for (String s:value.keySet()) {
long h64 = value.get(s);
assertEquals("\""+s+"\".hashCode64()", h64, new
MyCharSeq(s).hashCode64());
}
}
}
Enjoy,
~Pierre
On Tue, 2007-04-10 at 23:01 +0100, Richard Jones wrote:
> > This is a very interesting idea, and I'd like to add a consistent
> > hash driver to my java interface, but there's a bit of gap between
> > your writeup and what the implementation.
>
> Check the init() method - that builds up the appropriate data structure (a
> TreeMap called buckets iirc). You can use any hashing mechanism you want
> instead of the md5 thing, that just seemed a convenient way to generate a
> bunch of hashes. If you change it you won't be compatible with other
> implementations - not a problem if you are exclusively using java tho.
>
> Once you've build the data structure using the servers list, that is pretty
> much it in java, just check how keys are mapped to servers in findPointFor().
>
> RJ
>
>
> >
> > In the java interface, all I see is that you've added a new hash
> > type that's based off an MD5. Is that actually an important detail?
> > It seems like you should be able to use any hash that produces an
> > even 32-bits.
> >
> > Is it OK if I use your C implementation as a specification?
>
More information about the memcached
mailing list