Patch: Better hash function

Thu Oct 12 04:07:32 UTC 2006

I don't think so.

Steve benchmarked it and saw no improvement over the latest hash lookup
(Jenkins2006).

Here's a good paper that really dives into the concerns I have about judy.

   http://www.nothings.org/computer/judy/

The money quote:

"There are a few potential disadvantages to keep in mind about the Judy
approach. Its author argues that approaching the problem as engineering
means accepting that complexity is how you get good performance; simplicity
is for theoreticians. But this complexity leads to things like a data
structure with 20,000 lines of code designed on the assumption that a
cache-line is 64 bytes being used primarily on machines with 32-byte cache
lines; a redesign optimized for 32-byte cache lines would come close to
starting from scratch.

Additionally, some of Judy's space and speed improvements come from
strategies that could be used by any data structure but aren't because
they're not good software engineering. Most abstract data types give you a
fixed pointer good for the life of that datatype; Judy omits one level of
indirection and gives you a pointer that may change during the life of the
associative array as it grows and shrinks; if you want multiple objects to
have pointers to the same Judy array, you'll need to wrap the
changing-pointer in another data structure--just like how data structures
are normally. Judy saves space by encoding information in the bottom bits of
the pointer; this would interfere with any other use of the bottom bits,
e.g. in a language that uses tagged boxed data like OCaml."

On 10/11/06 8:31 PM, "Jason Titus" <jazzmantitus at yahoo.com> wrote:

> Is it worth considering Judy trees ( http://judy.sourceforge.net/ )
> instead of the existing hash table?  The claim is that it can be
> faster and more memory efficient (and even allows iteration).  There
> is a pretty thorough set of documentation on them at - http://
> judy.sourceforge.net/application/shop_interm.pdf
> 
> I thought about this a while ago but didn't mention it because it
> seemed like a big change.  But if folks are considering changing the
> hashing algorithm.....
> 
> Jason
> 
> On Oct 11, 2006, at 5:22 PM, Steven Grimm wrote:
> 
>> This patch replaces the old hash function with a newer one from the
>> same author (see http://www.burtleburtle.net/bob/hash/doobs.html --
>> this is the code from lookup3.c referenced on that page). The new
>> function is significantly faster and produces better output.
>> 
>> The only controversial thing here is that there's now a test in
>> configure.ac to detect whether the system is little- or big-endian,
>> because the new algorithm needs to do different things depending on
>> byte order. I've tested it on MacOS X and Linux, but before I
>> commit this I'd appreciate some confirmation that the byte order
>> test works on other platforms.
>> 
>> -Steve
>> 
>> Index: assoc.c
>> ===================================================================
>> --- assoc.c (revision 411)
>> +++ assoc.c (working copy)
>> @@ -12,6 +12,7 @@
>>   *
>>   * $Id$
>>   */
>> +#include "config.h"
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>>  #include <sys/time.h>
>> @@ -39,91 +40,424 @@
>>  #define hashsize(n) ((ub4)1<<(n))
>>  #define hashmask(n) (hashsize(n)-1)
>> 
>> +/*
>> + * Since the hash function does bit manipulation, it needs to know
>> + * whether it's big or little-endian. ENDIAN_LITTLE and ENDIAN_BIG
>> + * are set in the configure script.
>> + */
>> +#if ENDIAN_BIG == 1
>> +# define HASH_LITTLE_ENDIAN 0
>> +# define HASH_BIG_ENDIAN 1
>> +#else
>> +# if ENDIAN_LITTLE == 1
>> +#  define HASH_LITTLE_ENDIAN 1
>> +#  define HASH_BIG_ENDIAN 0
>> +# else
>> +#  define HASH_LITTLE_ENDIAN 0
>> +#  define HASH_BIG_ENDIAN 0
>> +# endif
>> +#endif
>> +
>> +#define rot(x,k) (((x)<<(k)) ^ ((x)>>(32-(k))))
>> +
>> +/*
>> +---------------------------------------------------------------------
>> ----------
>> +mix -- mix 3 32-bit values reversibly.
>> +
>> +This is reversible, so any information in (a,b,c) before mix() is
>> +still in (a,b,c) after mix().
>> +
>> +If four pairs of (a,b,c) inputs are run through mix(), or through
>> +mix() in reverse, there are at least 32 bits of the output that
>> +are sometimes the same for one pair and different for another pair.
>> +This was tested for:
>> +* pairs that differed by one bit, by two bits, in any combination
>> +  of top bits of (a,b,c), or in any combination of bottom bits of
>> +  (a,b,c).
>> +* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
>> +  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
>> +  is commonly produced by subtraction) look like a single 1-bit
>> +  difference.
>> +* the base values were pseudorandom, all zero but one bit set, or
>> +  all zero plus a counter that starts at zero.
>> +
>> +Some k values for my "a-=c; a^=rot(c,k); c+=b;" arrangement that
>> +satisfy this are
>> +    4  6  8 16 19  4
>> +    9 15  3 18 27 15
>> +   14  9  3  7 17  3
>> +Well, "9 15 3 18 27 15" didn't quite get 32 bits diffing
>> +for "differ" defined as + with a one-bit base and a two-bit delta.  I
>> +used http://burtleburtle.net/bob/hash/avalanche.html to choose
>> +the operations, constants, and arrangements of the variables.
>> +
>> +This does not achieve avalanche.  There are input bits of (a,b,c)
>> +that fail to affect some output bits of (a,b,c), especially of a.
>> The
>> +most thoroughly mixed value is c, but it doesn't really even achieve
>> +avalanche in c.
>> +
>> +This allows some parallelism.  Read-after-writes are good at doubling
>> +the number of bits affected, so the goal of mixing pulls in the
>> opposite
>> +direction as the goal of parallelism.  I did what I could.  Rotates
>> +seem to cost as much as shifts on every machine I could lay my hands
>> +on, and rotates are much kinder to the top and bottom bits, so I used
>> +rotates.
>> +---------------------------------------------------------------------
>> ----------
>> +*/
>>  #define mix(a,b,c) \
>>  { \
>> -  a -= b; a -= c; a ^= (c>>13); \
>> -  b -= c; b -= a; b ^= (a<<8); \
>> -  c -= a; c -= b; c ^= (b>>13); \
>> -  a -= b; a -= c; a ^= (c>>12);  \
>> -  b -= c; b -= a; b ^= (a<<16); \
>> -  c -= a; c -= b; c ^= (b>>5); \
>> -  a -= b; a -= c; a ^= (c>>3);  \
>> -  b -= c; b -= a; b ^= (a<<10); \
>> -  c -= a; c -= b; c ^= (b>>15); \
>> +  a -= c;  a ^= rot(c, 4);  c += b; \
>> +  b -= a;  b ^= rot(a, 6);  a += c; \
>> +  c -= b;  c ^= rot(b, 8);  b += a; \
>> +  a -= c;  a ^= rot(c,16);  c += b; \
>> +  b -= a;  b ^= rot(a,19);  a += c; \
>> +  c -= b;  c ^= rot(b, 4);  b += a; \
>>  }
>> 
>>  /*
>> ---------------------------------------------------------------------
>> -hash() -- hash a variable-length key into a 32-bit value
>> -  k       : the key (the unaligned variable-length array of bytes)
>> -  len     : the length of the key, counting by bytes
>> -  initval : can be any 4-byte value
>> -Returns a 32-bit value.  Every bit of the key affects every bit of
>> -the return value.  Every 1-bit and 2-bit delta achieves avalanche.
>> -About 6*len+35 instructions.
>> +---------------------------------------------------------------------
>> ----------
>> +final -- final mixing of 3 32-bit values (a,b,c) into c
>> 
>> -The best hash table sizes are powers of 2.  There is no need to do
>> -mod a prime (mod is sooo slow!).  If you need less than 32 bits,
>> -use a bitmask.  For example, if you need only 10 bits, do
>> -  h = (h & hashmask(10));
>> -In which case, the hash table should have hashsize(10) elements.
>> +Pairs of (a,b,c) values differing in only a few bits will usually
>> +produce values of c that look totally different.  This was tested for
>> +* pairs that differed by one bit, by two bits, in any combination
>> +  of top bits of (a,b,c), or in any combination of bottom bits of
>> +  (a,b,c).
>> +* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
>> +  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
>> +  is commonly produced by subtraction) look like a single 1-bit
>> +  difference.
>> +* the base values were pseudorandom, all zero but one bit set, or
>> +  all zero plus a counter that starts at zero.
>> 
>> -If you are hashing n strings (ub1 **)k, do it like this:
>> -  for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);
>> +These constants passed:
>> + 14 11 25 16 4 14 24
>> + 12 14 25 16 4 14 24
>> +and these came close:
>> +  4  8 15 26 3 22 24
>> + 10  8 15 26 3 22 24
>> + 11  8 15 26 3 22 24
>> +---------------------------------------------------------------------
>> ----------
>> +*/
>> +#define final(a,b,c) \
>> +{ \
>> +  c ^= b; c -= rot(b,14); \
>> +  a ^= c; a -= rot(c,11); \
>> +  b ^= a; b -= rot(a,25); \
>> +  c ^= b; c -= rot(b,16); \
>> +  a ^= c; a -= rot(c,4);  \
>> +  b ^= a; b -= rot(a,14); \
>> +  c ^= b; c -= rot(b,24); \
>> +}
>> 
>> -By Bob Jenkins, 1996.  bob_jenkins at burtleburtle.net.  You may use
>> this
>> -code any way you wish, private, educational, or commercial.  It's
>> free.
>> +#if HASH_LITTLE_ENDIAN == 1
>> +uint32_t hash(
>> +  const void *key,       /* the key to hash */
>> +  size_t      length,    /* length of the key */
>> +  uint32_t    initval)   /* initval */
>> +{
>> +  uint32_t a,b,c;                                          /*
>> internal state */
>> +  union { const void *ptr; size_t i; } u;     /* needed for Mac
>> Powerbook G4 */
>> 
>> -See http://burtleburtle.net/bob/hash/evahash.html
>> -Use for hash table lookup, or anything where one collision in
>> 2^^32 is
>> -acceptable.  Do NOT use for cryptographic purposes.
>> ---------------------------------------------------------------------
>> -*/
>> +  /* Set up the internal state */
>> +  a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
>> 
>> -ub4 hash( k, length, initval)
>> -     register ub1 *k;        /* the key */
>> -     register ub4  length;   /* the length of the key */
>> -     register ub4  initval;  /* the previous hash, or an arbitrary
>> value */
>> +  u.ptr = key;
>> +  if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {
>> +    const uint32_t *k = key;                           /* read 32-
>> bit chunks */
>> +#ifdef VALGRIND
>> +    const uint8_t  *k8;
>> +#endif // ifdef VALGRIND
>> +
>> +    /*------ all but last block: aligned reads and affect 32 bits
>> of (a,b,c) */
>> +    while (length > 12)
>> +    {
>> +      a += k[0];
>> +      b += k[1];
>> +      c += k[2];
>> +      mix(a,b,c);
>> +      length -= 12;
>> +      k += 3;
>> +    }
>> +
>> +    /*----------------------------- handle the last (probably
>> partial) block */
>> +    /*
>> +     * "k[2]&0xffffff" actually reads beyond the end of the
>> string, but
>> +     * then masks off the part it's not allowed to read.  Because the
>> +     * string is aligned, the masked-off tail is in the same word
>> as the
>> +     * rest of the string.  Every machine with memory protection
>> I've seen
>> +     * does it on word boundaries, so is OK with this.  But
>> VALGRIND will
>> +     * still catch it and complain.  The masking trick does make
>> the hash
>> +     * noticably faster for short strings (like English words).
>> +     */
>> +#ifndef VALGRIND
>> +
>> +    switch(length)
>> +    {
>> +    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
>> +    case 11: c+=k[2]&0xffffff; b+=k[1]; a+=k[0]; break;
>> +    case 10: c+=k[2]&0xffff; b+=k[1]; a+=k[0]; break;
>> +    case 9 : c+=k[2]&0xff; b+=k[1]; a+=k[0]; break;
>> +    case 8 : b+=k[1]; a+=k[0]; break;
>> +    case 7 : b+=k[1]&0xffffff; a+=k[0]; break;
>> +    case 6 : b+=k[1]&0xffff; a+=k[0]; break;
>> +    case 5 : b+=k[1]&0xff; a+=k[0]; break;
>> +    case 4 : a+=k[0]; break;
>> +    case 3 : a+=k[0]&0xffffff; break;
>> +    case 2 : a+=k[0]&0xffff; break;
>> +    case 1 : a+=k[0]&0xff; break;
>> +    case 0 : return c;  /* zero length strings require no mixing */
>> +    }
>> +
>> +#else /* make valgrind happy */
>> +
>> +    k8 = (const uint8_t *)k;
>> +    switch(length)
>> +    {
>> +    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
>> +    case 11: c+=((uint32_t)k8[10])<<16;  /* fall through */
>> +    case 10: c+=((uint32_t)k8[9])<<8;    /* fall through */
>> +    case 9 : c+=k8[8];                   /* fall through */
>> +    case 8 : b+=k[1]; a+=k[0]; break;
>> +    case 7 : b+=((uint32_t)k8[6])<<16;   /* fall through */
>> +    case 6 : b+=((uint32_t)k8[5])<<8;    /* fall through */
>> +    case 5 : b+=k8[4];                   /* fall through */
>> +    case 4 : a+=k[0]; break;
>> +    case 3 : a+=((uint32_t)k8[2])<<16;   /* fall through */
>> +    case 2 : a+=((uint32_t)k8[1])<<8;    /* fall through */
>> +    case 1 : a+=k8[0]; break;
>> +    case 0 : return c;  /* zero length strings require no mixing */
>> +    }
>> +
>> +#endif /* !valgrind */
>> +
>> +  } else if (HASH_LITTLE_ENDIAN && ((u.i & 0x1) == 0)) {
>> +    const uint16_t *k = key;                           /* read 16-
>> bit chunks */
>> +    const uint8_t  *k8;
>> +
>> +    /*--------------- all but last block: aligned reads and
>> different mixing */
>> +    while (length > 12)
>> +    {
>> +      a += k[0] + (((uint32_t)k[1])<<16);
>> +      b += k[2] + (((uint32_t)k[3])<<16);
>> +      c += k[4] + (((uint32_t)k[5])<<16);
>> +      mix(a,b,c);
>> +      length -= 12;
>> +      k += 6;
>> +    }
>> +
>> +    /*----------------------------- handle the last (probably
>> partial) block */
>> +    k8 = (const uint8_t *)k;
>> +    switch(length)
>> +    {
>> +    case 12: c+=k[4]+(((uint32_t)k[5])<<16);
>> +             b+=k[2]+(((uint32_t)k[3])<<16);
>> +             a+=k[0]+(((uint32_t)k[1])<<16);
>> +             break;
>> +    case 11: c+=((uint32_t)k8[10])<<16;     /* fall through */
>> +    case 10: c+=k[4];
>> +             b+=k[2]+(((uint32_t)k[3])<<16);
>> +             a+=k[0]+(((uint32_t)k[1])<<16);
>> +             break;
>> +    case 9 : c+=k8[8];                      /* fall through */
>> +    case 8 : b+=k[2]+(((uint32_t)k[3])<<16);
>> +             a+=k[0]+(((uint32_t)k[1])<<16);
>> +             break;
>> +    case 7 : b+=((uint32_t)k8[6])<<16;      /* fall through */
>> +    case 6 : b+=k[2];
>> +             a+=k[0]+(((uint32_t)k[1])<<16);
>> +             break;
>> +    case 5 : b+=k8[4];                      /* fall through */
>> +    case 4 : a+=k[0]+(((uint32_t)k[1])<<16);
>> +             break;
>> +    case 3 : a+=((uint32_t)k8[2])<<16;      /* fall through */
>> +    case 2 : a+=k[0];
>> +             break;
>> +    case 1 : a+=k8[0];
>> +             break;
>> +    case 0 : return c;  /* zero length strings require no mixing */
>> +    }
>> +
>> +  } else {                        /* need to read the key one byte
>> at a time */
>> +    const uint8_t *k = key;
>> +
>> +    /*--------------- all but the last block: affect some 32 bits
>> of (a,b,c) */
>> +    while (length > 12)
>> +    {
>> +      a += k[0];
>> +      a += ((uint32_t)k[1])<<8;
>> +      a += ((uint32_t)k[2])<<16;
>> +      a += ((uint32_t)k[3])<<24;
>> +      b += k[4];
>> +      b += ((uint32_t)k[5])<<8;
>> +      b += ((uint32_t)k[6])<<16;
>> +      b += ((uint32_t)k[7])<<24;
>> +      c += k[8];
>> +      c += ((uint32_t)k[9])<<8;
>> +      c += ((uint32_t)k[10])<<16;
>> +      c += ((uint32_t)k[11])<<24;
>> +      mix(a,b,c);
>> +      length -= 12;
>> +      k += 12;
>> +    }
>> +
>> +    /*-------------------------------- last block: affect all 32
>> bits of (c) */
>> +    switch(length)                   /* all the case statements
>> fall through */
>> +    {
>> +    case 12: c+=((uint32_t)k[11])<<24;
>> +    case 11: c+=((uint32_t)k[10])<<16;
>> +    case 10: c+=((uint32_t)k[9])<<8;
>> +    case 9 : c+=k[8];
>> +    case 8 : b+=((uint32_t)k[7])<<24;
>> +    case 7 : b+=((uint32_t)k[6])<<16;
>> +    case 6 : b+=((uint32_t)k[5])<<8;
>> +    case 5 : b+=k[4];
>> +    case 4 : a+=((uint32_t)k[3])<<24;
>> +    case 3 : a+=((uint32_t)k[2])<<16;
>> +    case 2 : a+=((uint32_t)k[1])<<8;
>> +    case 1 : a+=k[0];
>> +             break;
>> +    case 0 : return c;  /* zero length strings require no mixing */
>> +    }
>> +  }
>> +
>> +  final(a,b,c);
>> +  return c;             /* zero length strings require no mixing */
>> +}
>> +
>> +#elif HASH_BIG_ENDIAN == 1
>> +/*
>> + * hashbig():
>> + * This is the same as hashword() on big-endian machines.  It is
>> different
>> + * from hashlittle() on all machines.  hashbig() takes advantage of
>> + * big-endian byte ordering.
>> + */
>> +uint32_t hash( const void *key, size_t length, uint32_t initval)
>>  {
>> -    register ub4 a,b,c,len;
>> +  uint32_t a,b,c;
>> +  union { const void *ptr; size_t i; } u; /* to cast key to
>> (size_t) happily */
>> 
>> -    /* Set up the internal state */
>> -    len = length;
>> -    a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
>> -    c = initval;         /* the previous hash value */
>> +  /* Set up the internal state */
>> +  a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
>> 
>> -    /*---------------------------------------- handle most of the
>> key */
>> -    while (len >= 12)
>> -        {
>> -            a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]
>> <<24));
>> -            b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]
>> <<24));
>> -            c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k
>> [11]<<24));
>> -            mix(a,b,c);
>> -            k += 12; len -= 12;
>> -        }
>> +  u.ptr = key;
>> +  if (HASH_BIG_ENDIAN && ((u.i & 0x3) == 0)) {
>> +    const uint32_t *k = key;                           /* read 32-
>> bit chunks */
>> +#ifdef VALGRIND
>> +    const uint8_t  *k8;
>> +#endif // ifdef VALGRIND
>> 
>> -    /*------------------------------------- handle the last 11
>> bytes */
>> -    c += length;
>> -    switch(len)              /* all the case statements fall
>> through */
>> -        {
>> -        case 11: c+=((ub4)k[10]<<24);
>> -        case 10: c+=((ub4)k[9]<<16);
>> -        case 9 : c+=((ub4)k[8]<<8);
>> -            /* the first byte of c is reserved for the length */
>> -        case 8 : b+=((ub4)k[7]<<24);
>> -        case 7 : b+=((ub4)k[6]<<16);
>> -        case 6 : b+=((ub4)k[5]<<8);
>> -        case 5 : b+=k[4];
>> -        case 4 : a+=((ub4)k[3]<<24);
>> -        case 3 : a+=((ub4)k[2]<<16);
>> -        case 2 : a+=((ub4)k[1]<<8);
>> -        case 1 : a+=k[0];
>> -            /* case 0: nothing left to add */
>> -        }
>> -    mix(a,b,c);
>> -    /*-------------------------------------------- report the
>> result */
>> -    return c;
>> +    /*------ all but last block: aligned reads and affect 32 bits
>> of (a,b,c) */
>> +    while (length > 12)
>> +    {
>> +      a += k[0];
>> +      b += k[1];
>> +      c += k[2];
>> +      mix(a,b,c);
>> +      length -= 12;
>> +      k += 3;
>> +    }
>> +
>> +    /*----------------------------- handle the last (probably
>> partial) block */
>> +    /*
>> +     * "k[2]<<8" actually reads beyond the end of the string, but
>> +     * then shifts out the part it's not allowed to read.  Because
>> the
>> +     * string is aligned, the illegal read is in the same word as the
>> +     * rest of the string.  Every machine with memory protection
>> I've seen
>> +     * does it on word boundaries, so is OK with this.  But
>> VALGRIND will
>> +     * still catch it and complain.  The masking trick does make
>> the hash
>> +     * noticably faster for short strings (like English words).
>> +     */
>> +#ifndef VALGRIND
>> +
>> +    switch(length)
>> +    {
>> +    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
>> +    case 11: c+=k[2]&0xffffff00; b+=k[1]; a+=k[0]; break;
>> +    case 10: c+=k[2]&0xffff0000; b+=k[1]; a+=k[0]; break;
>> +    case 9 : c+=k[2]&0xff000000; b+=k[1]; a+=k[0]; break;
>> +    case 8 : b+=k[1]; a+=k[0]; break;
>> +    case 7 : b+=k[1]&0xffffff00; a+=k[0]; break;
>> +    case 6 : b+=k[1]&0xffff0000; a+=k[0]; break;
>> +    case 5 : b+=k[1]&0xff000000; a+=k[0]; break;
>> +    case 4 : a+=k[0]; break;
>> +    case 3 : a+=k[0]&0xffffff00; break;
>> +    case 2 : a+=k[0]&0xffff0000; break;
>> +    case 1 : a+=k[0]&0xff000000; break;
>> +    case 0 : return c;              /* zero length strings require
>> no mixing */
>> +    }
>> +
>> +#else  /* make valgrind happy */
>> +
>> +    k8 = (const uint8_t *)k;
>> +    switch(length)                   /* all the case statements
>> fall through */
>> +    {
>> +    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
>> +    case 11: c+=((uint32_t)k8[10])<<8;  /* fall through */
>> +    case 10: c+=((uint32_t)k8[9])<<16;  /* fall through */
>> +    case 9 : c+=((uint32_t)k8[8])<<24;  /* fall through */
>> +    case 8 : b+=k[1]; a+=k[0]; break;
>> +    case 7 : b+=((uint32_t)k8[6])<<8;   /* fall through */
>> +    case 6 : b+=((uint32_t)k8[5])<<16;  /* fall through */
>> +    case 5 : b+=((uint32_t)k8[4])<<24;  /* fall through */
>> +    case 4 : a+=k[0]; break;
>> +    case 3 : a+=((uint32_t)k8[2])<<8;   /* fall through */
>> +    case 2 : a+=((uint32_t)k8[1])<<16;  /* fall through */
>> +    case 1 : a+=((uint32_t)k8[0])<<24; break;
>> +    case 0 : return c;
>> +    }
>> +
>> +#endif /* !VALGRIND */
>> +
>> +  } else {                        /* need to read the key one byte
>> at a time */
>> +    const uint8_t *k = key;
>> +
>> +    /*--------------- all but the last block: affect some 32 bits
>> of (a,b,c) */
>> +    while (length > 12)
>> +    {
>> +      a += ((uint32_t)k[0])<<24;
>> +      a += ((uint32_t)k[1])<<16;
>> +      a += ((uint32_t)k[2])<<8;
>> +      a += ((uint32_t)k[3]);
>> +      b += ((uint32_t)k[4])<<24;
>> +      b += ((uint32_t)k[5])<<16;
>> +      b += ((uint32_t)k[6])<<8;
>> +      b += ((uint32_t)k[7]);
>> +      c += ((uint32_t)k[8])<<24;
>> +      c += ((uint32_t)k[9])<<16;
>> +      c += ((uint32_t)k[10])<<8;
>> +      c += ((uint32_t)k[11]);
>> +      mix(a,b,c);
>> +      length -= 12;
>> +      k += 12;
>> +    }
>> +
>> +    /*-------------------------------- last block: affect all 32
>> bits of (c) */
>> +    switch(length)                   /* all the case statements
>> fall through */
>> +    {
>> +    case 12: c+=k[11];
>> +    case 11: c+=((uint32_t)k[10])<<8;
>> +    case 10: c+=((uint32_t)k[9])<<16;
>> +    case 9 : c+=((uint32_t)k[8])<<24;
>> +    case 8 : b+=k[7];
>> +    case 7 : b+=((uint32_t)k[6])<<8;
>> +    case 6 : b+=((uint32_t)k[5])<<16;
>> +    case 5 : b+=((uint32_t)k[4])<<24;
>> +    case 4 : a+=k[3];
>> +    case 3 : a+=((uint32_t)k[2])<<8;
>> +    case 2 : a+=((uint32_t)k[1])<<16;
>> +    case 1 : a+=((uint32_t)k[0])<<24;
>> +             break;
>> +    case 0 : return c;
>> +    }
>> +  }
>> +
>> +  final(a,b,c);
>> +  return c;
>>  }
>> +#else // HASH_XXX_ENDIAN == 1
>> +#error Must define HASH_BIG_ENDIAN or HASH_LITTLE_ENDIAN
>> +#endif // hash_XXX_ENDIAN == 1
>> 
>>  static item** hashtable = 0;
>> 
>> Index: configure.ac
>> ===================================================================
>> --- configure.ac (revision 411)
>> +++ configure.ac (working copy)
>> @@ -129,6 +129,32 @@
>> 
>>  AC_C_SOCKLEN_T
>> 
>> +dnl Check if we're a little-endian or a big-endian system, needed
>> by hash code
>> +AC_DEFUN(AC_C_ENDIAN,
>> +[AC_CACHE_CHECK(for endianness, ac_cv_c_endian,
>> +[
>> +  AC_RUN_IFELSE(
>> +    [AC_LANG_PROGRAM([], [dnl
>> +        long val = 1;
>> +        char *c = (char *) &val;
>> +        exit(*c == 1);
>> +    ])
>> +  ],[
>> +    ac_cv_c_endian=big
>> +  ],[
>> +    ac_cv_c_endian=little
>> +  ])
>> +])
>> +if test $ac_cv_c_endian = big; then
>> +  AC_DEFINE(ENDIAN_BIG, 1, [machine is bigendian])
>> +fi
>> +if test $ac_cv_c_endian = little; then
>> +  AC_DEFINE(ENDIAN_LITTLE, 1, [machine is littleendian])
>> +fi
>> +])
>> +
>> +AC_C_ENDIAN
>> +
>>  AC_CHECK_FUNCS(mlockall)
>> 
>>  AC_CONFIG_FILES(Makefile doc/Makefile)
>