Patch: Better hash function
Jason Titus
jazzmantitus at yahoo.com
Thu Oct 12 03:31:58 UTC 2006
Is it worth considering Judy trees ( http://judy.sourceforge.net/ )
instead of the existing hash table? The claim is that it can be
faster and more memory efficient (and even allows iteration). There
is a pretty thorough set of documentation on them at - http://
judy.sourceforge.net/application/shop_interm.pdf
I thought about this a while ago but didn't mention it because it
seemed like a big change. But if folks are considering changing the
hashing algorithm.....
Jason
On Oct 11, 2006, at 5:22 PM, Steven Grimm wrote:
> This patch replaces the old hash function with a newer one from the
> same author (see http://www.burtleburtle.net/bob/hash/doobs.html --
> this is the code from lookup3.c referenced on that page). The new
> function is significantly faster and produces better output.
>
> The only controversial thing here is that there's now a test in
> configure.ac to detect whether the system is little- or big-endian,
> because the new algorithm needs to do different things depending on
> byte order. I've tested it on MacOS X and Linux, but before I
> commit this I'd appreciate some confirmation that the byte order
> test works on other platforms.
>
> -Steve
>
> Index: assoc.c
> ===================================================================
> --- assoc.c (revision 411)
> +++ assoc.c (working copy)
> @@ -12,6 +12,7 @@
> *
> * $Id$
> */
> +#include "config.h"
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> @@ -39,91 +40,424 @@
> #define hashsize(n) ((ub4)1<<(n))
> #define hashmask(n) (hashsize(n)-1)
>
> +/*
> + * Since the hash function does bit manipulation, it needs to know
> + * whether it's big or little-endian. ENDIAN_LITTLE and ENDIAN_BIG
> + * are set in the configure script.
> + */
> +#if ENDIAN_BIG == 1
> +# define HASH_LITTLE_ENDIAN 0
> +# define HASH_BIG_ENDIAN 1
> +#else
> +# if ENDIAN_LITTLE == 1
> +# define HASH_LITTLE_ENDIAN 1
> +# define HASH_BIG_ENDIAN 0
> +# else
> +# define HASH_LITTLE_ENDIAN 0
> +# define HASH_BIG_ENDIAN 0
> +# endif
> +#endif
> +
> +#define rot(x,k) (((x)<<(k)) ^ ((x)>>(32-(k))))
> +
> +/*
> +---------------------------------------------------------------------
> ----------
> +mix -- mix 3 32-bit values reversibly.
> +
> +This is reversible, so any information in (a,b,c) before mix() is
> +still in (a,b,c) after mix().
> +
> +If four pairs of (a,b,c) inputs are run through mix(), or through
> +mix() in reverse, there are at least 32 bits of the output that
> +are sometimes the same for one pair and different for another pair.
> +This was tested for:
> +* pairs that differed by one bit, by two bits, in any combination
> + of top bits of (a,b,c), or in any combination of bottom bits of
> + (a,b,c).
> +* "differ" is defined as +, -, ^, or ~^. For + and -, I transformed
> + the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
> + is commonly produced by subtraction) look like a single 1-bit
> + difference.
> +* the base values were pseudorandom, all zero but one bit set, or
> + all zero plus a counter that starts at zero.
> +
> +Some k values for my "a-=c; a^=rot(c,k); c+=b;" arrangement that
> +satisfy this are
> + 4 6 8 16 19 4
> + 9 15 3 18 27 15
> + 14 9 3 7 17 3
> +Well, "9 15 3 18 27 15" didn't quite get 32 bits diffing
> +for "differ" defined as + with a one-bit base and a two-bit delta. I
> +used http://burtleburtle.net/bob/hash/avalanche.html to choose
> +the operations, constants, and arrangements of the variables.
> +
> +This does not achieve avalanche. There are input bits of (a,b,c)
> +that fail to affect some output bits of (a,b,c), especially of a.
> The
> +most thoroughly mixed value is c, but it doesn't really even achieve
> +avalanche in c.
> +
> +This allows some parallelism. Read-after-writes are good at doubling
> +the number of bits affected, so the goal of mixing pulls in the
> opposite
> +direction as the goal of parallelism. I did what I could. Rotates
> +seem to cost as much as shifts on every machine I could lay my hands
> +on, and rotates are much kinder to the top and bottom bits, so I used
> +rotates.
> +---------------------------------------------------------------------
> ----------
> +*/
> #define mix(a,b,c) \
> { \
> - a -= b; a -= c; a ^= (c>>13); \
> - b -= c; b -= a; b ^= (a<<8); \
> - c -= a; c -= b; c ^= (b>>13); \
> - a -= b; a -= c; a ^= (c>>12); \
> - b -= c; b -= a; b ^= (a<<16); \
> - c -= a; c -= b; c ^= (b>>5); \
> - a -= b; a -= c; a ^= (c>>3); \
> - b -= c; b -= a; b ^= (a<<10); \
> - c -= a; c -= b; c ^= (b>>15); \
> + a -= c; a ^= rot(c, 4); c += b; \
> + b -= a; b ^= rot(a, 6); a += c; \
> + c -= b; c ^= rot(b, 8); b += a; \
> + a -= c; a ^= rot(c,16); c += b; \
> + b -= a; b ^= rot(a,19); a += c; \
> + c -= b; c ^= rot(b, 4); b += a; \
> }
>
> /*
> ---------------------------------------------------------------------
> -hash() -- hash a variable-length key into a 32-bit value
> - k : the key (the unaligned variable-length array of bytes)
> - len : the length of the key, counting by bytes
> - initval : can be any 4-byte value
> -Returns a 32-bit value. Every bit of the key affects every bit of
> -the return value. Every 1-bit and 2-bit delta achieves avalanche.
> -About 6*len+35 instructions.
> +---------------------------------------------------------------------
> ----------
> +final -- final mixing of 3 32-bit values (a,b,c) into c
>
> -The best hash table sizes are powers of 2. There is no need to do
> -mod a prime (mod is sooo slow!). If you need less than 32 bits,
> -use a bitmask. For example, if you need only 10 bits, do
> - h = (h & hashmask(10));
> -In which case, the hash table should have hashsize(10) elements.
> +Pairs of (a,b,c) values differing in only a few bits will usually
> +produce values of c that look totally different. This was tested for
> +* pairs that differed by one bit, by two bits, in any combination
> + of top bits of (a,b,c), or in any combination of bottom bits of
> + (a,b,c).
> +* "differ" is defined as +, -, ^, or ~^. For + and -, I transformed
> + the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
> + is commonly produced by subtraction) look like a single 1-bit
> + difference.
> +* the base values were pseudorandom, all zero but one bit set, or
> + all zero plus a counter that starts at zero.
>
> -If you are hashing n strings (ub1 **)k, do it like this:
> - for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);
> +These constants passed:
> + 14 11 25 16 4 14 24
> + 12 14 25 16 4 14 24
> +and these came close:
> + 4 8 15 26 3 22 24
> + 10 8 15 26 3 22 24
> + 11 8 15 26 3 22 24
> +---------------------------------------------------------------------
> ----------
> +*/
> +#define final(a,b,c) \
> +{ \
> + c ^= b; c -= rot(b,14); \
> + a ^= c; a -= rot(c,11); \
> + b ^= a; b -= rot(a,25); \
> + c ^= b; c -= rot(b,16); \
> + a ^= c; a -= rot(c,4); \
> + b ^= a; b -= rot(a,14); \
> + c ^= b; c -= rot(b,24); \
> +}
>
> -By Bob Jenkins, 1996. bob_jenkins at burtleburtle.net. You may use
> this
> -code any way you wish, private, educational, or commercial. It's
> free.
> +#if HASH_LITTLE_ENDIAN == 1
> +uint32_t hash(
> + const void *key, /* the key to hash */
> + size_t length, /* length of the key */
> + uint32_t initval) /* initval */
> +{
> + uint32_t a,b,c; /*
> internal state */
> + union { const void *ptr; size_t i; } u; /* needed for Mac
> Powerbook G4 */
>
> -See http://burtleburtle.net/bob/hash/evahash.html
> -Use for hash table lookup, or anything where one collision in
> 2^^32 is
> -acceptable. Do NOT use for cryptographic purposes.
> ---------------------------------------------------------------------
> -*/
> + /* Set up the internal state */
> + a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
>
> -ub4 hash( k, length, initval)
> - register ub1 *k; /* the key */
> - register ub4 length; /* the length of the key */
> - register ub4 initval; /* the previous hash, or an arbitrary
> value */
> + u.ptr = key;
> + if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {
> + const uint32_t *k = key; /* read 32-
> bit chunks */
> +#ifdef VALGRIND
> + const uint8_t *k8;
> +#endif // ifdef VALGRIND
> +
> + /*------ all but last block: aligned reads and affect 32 bits
> of (a,b,c) */
> + while (length > 12)
> + {
> + a += k[0];
> + b += k[1];
> + c += k[2];
> + mix(a,b,c);
> + length -= 12;
> + k += 3;
> + }
> +
> + /*----------------------------- handle the last (probably
> partial) block */
> + /*
> + * "k[2]&0xffffff" actually reads beyond the end of the
> string, but
> + * then masks off the part it's not allowed to read. Because the
> + * string is aligned, the masked-off tail is in the same word
> as the
> + * rest of the string. Every machine with memory protection
> I've seen
> + * does it on word boundaries, so is OK with this. But
> VALGRIND will
> + * still catch it and complain. The masking trick does make
> the hash
> + * noticably faster for short strings (like English words).
> + */
> +#ifndef VALGRIND
> +
> + switch(length)
> + {
> + case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
> + case 11: c+=k[2]&0xffffff; b+=k[1]; a+=k[0]; break;
> + case 10: c+=k[2]&0xffff; b+=k[1]; a+=k[0]; break;
> + case 9 : c+=k[2]&0xff; b+=k[1]; a+=k[0]; break;
> + case 8 : b+=k[1]; a+=k[0]; break;
> + case 7 : b+=k[1]&0xffffff; a+=k[0]; break;
> + case 6 : b+=k[1]&0xffff; a+=k[0]; break;
> + case 5 : b+=k[1]&0xff; a+=k[0]; break;
> + case 4 : a+=k[0]; break;
> + case 3 : a+=k[0]&0xffffff; break;
> + case 2 : a+=k[0]&0xffff; break;
> + case 1 : a+=k[0]&0xff; break;
> + case 0 : return c; /* zero length strings require no mixing */
> + }
> +
> +#else /* make valgrind happy */
> +
> + k8 = (const uint8_t *)k;
> + switch(length)
> + {
> + case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
> + case 11: c+=((uint32_t)k8[10])<<16; /* fall through */
> + case 10: c+=((uint32_t)k8[9])<<8; /* fall through */
> + case 9 : c+=k8[8]; /* fall through */
> + case 8 : b+=k[1]; a+=k[0]; break;
> + case 7 : b+=((uint32_t)k8[6])<<16; /* fall through */
> + case 6 : b+=((uint32_t)k8[5])<<8; /* fall through */
> + case 5 : b+=k8[4]; /* fall through */
> + case 4 : a+=k[0]; break;
> + case 3 : a+=((uint32_t)k8[2])<<16; /* fall through */
> + case 2 : a+=((uint32_t)k8[1])<<8; /* fall through */
> + case 1 : a+=k8[0]; break;
> + case 0 : return c; /* zero length strings require no mixing */
> + }
> +
> +#endif /* !valgrind */
> +
> + } else if (HASH_LITTLE_ENDIAN && ((u.i & 0x1) == 0)) {
> + const uint16_t *k = key; /* read 16-
> bit chunks */
> + const uint8_t *k8;
> +
> + /*--------------- all but last block: aligned reads and
> different mixing */
> + while (length > 12)
> + {
> + a += k[0] + (((uint32_t)k[1])<<16);
> + b += k[2] + (((uint32_t)k[3])<<16);
> + c += k[4] + (((uint32_t)k[5])<<16);
> + mix(a,b,c);
> + length -= 12;
> + k += 6;
> + }
> +
> + /*----------------------------- handle the last (probably
> partial) block */
> + k8 = (const uint8_t *)k;
> + switch(length)
> + {
> + case 12: c+=k[4]+(((uint32_t)k[5])<<16);
> + b+=k[2]+(((uint32_t)k[3])<<16);
> + a+=k[0]+(((uint32_t)k[1])<<16);
> + break;
> + case 11: c+=((uint32_t)k8[10])<<16; /* fall through */
> + case 10: c+=k[4];
> + b+=k[2]+(((uint32_t)k[3])<<16);
> + a+=k[0]+(((uint32_t)k[1])<<16);
> + break;
> + case 9 : c+=k8[8]; /* fall through */
> + case 8 : b+=k[2]+(((uint32_t)k[3])<<16);
> + a+=k[0]+(((uint32_t)k[1])<<16);
> + break;
> + case 7 : b+=((uint32_t)k8[6])<<16; /* fall through */
> + case 6 : b+=k[2];
> + a+=k[0]+(((uint32_t)k[1])<<16);
> + break;
> + case 5 : b+=k8[4]; /* fall through */
> + case 4 : a+=k[0]+(((uint32_t)k[1])<<16);
> + break;
> + case 3 : a+=((uint32_t)k8[2])<<16; /* fall through */
> + case 2 : a+=k[0];
> + break;
> + case 1 : a+=k8[0];
> + break;
> + case 0 : return c; /* zero length strings require no mixing */
> + }
> +
> + } else { /* need to read the key one byte
> at a time */
> + const uint8_t *k = key;
> +
> + /*--------------- all but the last block: affect some 32 bits
> of (a,b,c) */
> + while (length > 12)
> + {
> + a += k[0];
> + a += ((uint32_t)k[1])<<8;
> + a += ((uint32_t)k[2])<<16;
> + a += ((uint32_t)k[3])<<24;
> + b += k[4];
> + b += ((uint32_t)k[5])<<8;
> + b += ((uint32_t)k[6])<<16;
> + b += ((uint32_t)k[7])<<24;
> + c += k[8];
> + c += ((uint32_t)k[9])<<8;
> + c += ((uint32_t)k[10])<<16;
> + c += ((uint32_t)k[11])<<24;
> + mix(a,b,c);
> + length -= 12;
> + k += 12;
> + }
> +
> + /*-------------------------------- last block: affect all 32
> bits of (c) */
> + switch(length) /* all the case statements
> fall through */
> + {
> + case 12: c+=((uint32_t)k[11])<<24;
> + case 11: c+=((uint32_t)k[10])<<16;
> + case 10: c+=((uint32_t)k[9])<<8;
> + case 9 : c+=k[8];
> + case 8 : b+=((uint32_t)k[7])<<24;
> + case 7 : b+=((uint32_t)k[6])<<16;
> + case 6 : b+=((uint32_t)k[5])<<8;
> + case 5 : b+=k[4];
> + case 4 : a+=((uint32_t)k[3])<<24;
> + case 3 : a+=((uint32_t)k[2])<<16;
> + case 2 : a+=((uint32_t)k[1])<<8;
> + case 1 : a+=k[0];
> + break;
> + case 0 : return c; /* zero length strings require no mixing */
> + }
> + }
> +
> + final(a,b,c);
> + return c; /* zero length strings require no mixing */
> +}
> +
> +#elif HASH_BIG_ENDIAN == 1
> +/*
> + * hashbig():
> + * This is the same as hashword() on big-endian machines. It is
> different
> + * from hashlittle() on all machines. hashbig() takes advantage of
> + * big-endian byte ordering.
> + */
> +uint32_t hash( const void *key, size_t length, uint32_t initval)
> {
> - register ub4 a,b,c,len;
> + uint32_t a,b,c;
> + union { const void *ptr; size_t i; } u; /* to cast key to
> (size_t) happily */
>
> - /* Set up the internal state */
> - len = length;
> - a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */
> - c = initval; /* the previous hash value */
> + /* Set up the internal state */
> + a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
>
> - /*---------------------------------------- handle most of the
> key */
> - while (len >= 12)
> - {
> - a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]
> <<24));
> - b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]
> <<24));
> - c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k
> [11]<<24));
> - mix(a,b,c);
> - k += 12; len -= 12;
> - }
> + u.ptr = key;
> + if (HASH_BIG_ENDIAN && ((u.i & 0x3) == 0)) {
> + const uint32_t *k = key; /* read 32-
> bit chunks */
> +#ifdef VALGRIND
> + const uint8_t *k8;
> +#endif // ifdef VALGRIND
>
> - /*------------------------------------- handle the last 11
> bytes */
> - c += length;
> - switch(len) /* all the case statements fall
> through */
> - {
> - case 11: c+=((ub4)k[10]<<24);
> - case 10: c+=((ub4)k[9]<<16);
> - case 9 : c+=((ub4)k[8]<<8);
> - /* the first byte of c is reserved for the length */
> - case 8 : b+=((ub4)k[7]<<24);
> - case 7 : b+=((ub4)k[6]<<16);
> - case 6 : b+=((ub4)k[5]<<8);
> - case 5 : b+=k[4];
> - case 4 : a+=((ub4)k[3]<<24);
> - case 3 : a+=((ub4)k[2]<<16);
> - case 2 : a+=((ub4)k[1]<<8);
> - case 1 : a+=k[0];
> - /* case 0: nothing left to add */
> - }
> - mix(a,b,c);
> - /*-------------------------------------------- report the
> result */
> - return c;
> + /*------ all but last block: aligned reads and affect 32 bits
> of (a,b,c) */
> + while (length > 12)
> + {
> + a += k[0];
> + b += k[1];
> + c += k[2];
> + mix(a,b,c);
> + length -= 12;
> + k += 3;
> + }
> +
> + /*----------------------------- handle the last (probably
> partial) block */
> + /*
> + * "k[2]<<8" actually reads beyond the end of the string, but
> + * then shifts out the part it's not allowed to read. Because
> the
> + * string is aligned, the illegal read is in the same word as the
> + * rest of the string. Every machine with memory protection
> I've seen
> + * does it on word boundaries, so is OK with this. But
> VALGRIND will
> + * still catch it and complain. The masking trick does make
> the hash
> + * noticably faster for short strings (like English words).
> + */
> +#ifndef VALGRIND
> +
> + switch(length)
> + {
> + case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
> + case 11: c+=k[2]&0xffffff00; b+=k[1]; a+=k[0]; break;
> + case 10: c+=k[2]&0xffff0000; b+=k[1]; a+=k[0]; break;
> + case 9 : c+=k[2]&0xff000000; b+=k[1]; a+=k[0]; break;
> + case 8 : b+=k[1]; a+=k[0]; break;
> + case 7 : b+=k[1]&0xffffff00; a+=k[0]; break;
> + case 6 : b+=k[1]&0xffff0000; a+=k[0]; break;
> + case 5 : b+=k[1]&0xff000000; a+=k[0]; break;
> + case 4 : a+=k[0]; break;
> + case 3 : a+=k[0]&0xffffff00; break;
> + case 2 : a+=k[0]&0xffff0000; break;
> + case 1 : a+=k[0]&0xff000000; break;
> + case 0 : return c; /* zero length strings require
> no mixing */
> + }
> +
> +#else /* make valgrind happy */
> +
> + k8 = (const uint8_t *)k;
> + switch(length) /* all the case statements
> fall through */
> + {
> + case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
> + case 11: c+=((uint32_t)k8[10])<<8; /* fall through */
> + case 10: c+=((uint32_t)k8[9])<<16; /* fall through */
> + case 9 : c+=((uint32_t)k8[8])<<24; /* fall through */
> + case 8 : b+=k[1]; a+=k[0]; break;
> + case 7 : b+=((uint32_t)k8[6])<<8; /* fall through */
> + case 6 : b+=((uint32_t)k8[5])<<16; /* fall through */
> + case 5 : b+=((uint32_t)k8[4])<<24; /* fall through */
> + case 4 : a+=k[0]; break;
> + case 3 : a+=((uint32_t)k8[2])<<8; /* fall through */
> + case 2 : a+=((uint32_t)k8[1])<<16; /* fall through */
> + case 1 : a+=((uint32_t)k8[0])<<24; break;
> + case 0 : return c;
> + }
> +
> +#endif /* !VALGRIND */
> +
> + } else { /* need to read the key one byte
> at a time */
> + const uint8_t *k = key;
> +
> + /*--------------- all but the last block: affect some 32 bits
> of (a,b,c) */
> + while (length > 12)
> + {
> + a += ((uint32_t)k[0])<<24;
> + a += ((uint32_t)k[1])<<16;
> + a += ((uint32_t)k[2])<<8;
> + a += ((uint32_t)k[3]);
> + b += ((uint32_t)k[4])<<24;
> + b += ((uint32_t)k[5])<<16;
> + b += ((uint32_t)k[6])<<8;
> + b += ((uint32_t)k[7]);
> + c += ((uint32_t)k[8])<<24;
> + c += ((uint32_t)k[9])<<16;
> + c += ((uint32_t)k[10])<<8;
> + c += ((uint32_t)k[11]);
> + mix(a,b,c);
> + length -= 12;
> + k += 12;
> + }
> +
> + /*-------------------------------- last block: affect all 32
> bits of (c) */
> + switch(length) /* all the case statements
> fall through */
> + {
> + case 12: c+=k[11];
> + case 11: c+=((uint32_t)k[10])<<8;
> + case 10: c+=((uint32_t)k[9])<<16;
> + case 9 : c+=((uint32_t)k[8])<<24;
> + case 8 : b+=k[7];
> + case 7 : b+=((uint32_t)k[6])<<8;
> + case 6 : b+=((uint32_t)k[5])<<16;
> + case 5 : b+=((uint32_t)k[4])<<24;
> + case 4 : a+=k[3];
> + case 3 : a+=((uint32_t)k[2])<<8;
> + case 2 : a+=((uint32_t)k[1])<<16;
> + case 1 : a+=((uint32_t)k[0])<<24;
> + break;
> + case 0 : return c;
> + }
> + }
> +
> + final(a,b,c);
> + return c;
> }
> +#else // HASH_XXX_ENDIAN == 1
> +#error Must define HASH_BIG_ENDIAN or HASH_LITTLE_ENDIAN
> +#endif // hash_XXX_ENDIAN == 1
>
> static item** hashtable = 0;
>
> Index: configure.ac
> ===================================================================
> --- configure.ac (revision 411)
> +++ configure.ac (working copy)
> @@ -129,6 +129,32 @@
>
> AC_C_SOCKLEN_T
>
> +dnl Check if we're a little-endian or a big-endian system, needed
> by hash code
> +AC_DEFUN(AC_C_ENDIAN,
> +[AC_CACHE_CHECK(for endianness, ac_cv_c_endian,
> +[
> + AC_RUN_IFELSE(
> + [AC_LANG_PROGRAM([], [dnl
> + long val = 1;
> + char *c = (char *) &val;
> + exit(*c == 1);
> + ])
> + ],[
> + ac_cv_c_endian=big
> + ],[
> + ac_cv_c_endian=little
> + ])
> +])
> +if test $ac_cv_c_endian = big; then
> + AC_DEFINE(ENDIAN_BIG, 1, [machine is bigendian])
> +fi
> +if test $ac_cv_c_endian = little; then
> + AC_DEFINE(ENDIAN_LITTLE, 1, [machine is littleendian])
> +fi
> +])
> +
> +AC_C_ENDIAN
> +
> AC_CHECK_FUNCS(mlockall)
>
> AC_CONFIG_FILES(Makefile doc/Makefile)
More information about the memcached
mailing list