Moving key hashing into the client

Sat Jan 12 18:53:52 UTC 2008

On Jan 12, 2008, at 6:24 AM, Tomash Brechko wrote:
> I mean the server computes some hash value to decide where the key
> belongs in server's hash table.  I.e. it is how the server finds the
> key in its memory.

If the client passes in the same hash that it uses for server  
selection, won't that lead to nonuniform distribution of keys to hash  
buckets? Ideally you want a completely different hash function for  
server selection than for item bucketing within the server. A simple  
example:

Say you have two memcached hosts. The client computes a hash value and  
does a simple "modulo by number of hosts" to pick one of them. Then it  
sends its request along with the hash value.

The server does a simple "modulo by number of buckets" to figure out  
where to put the value. If the number of buckets is a power of two,  
half of them are guaranteed to always be unused in this scheme,  
because, e.g., only keys whose hash values have the low bit set will  
be sent to server #2, whereas only hash values whose low bit is clear  
will be sent to server #1. So you end up either wasting memory on  
empty buckets or, more likely, not even noticing that the linked lists  
in your buckets are twice as long as you'd like just because you're  
basing the "do I auto-expand the hashtable?" decision on number of  
items / number of buckets.

This is less of an issue if the number of servers is a prime number,  
or if you change memcached to always use a prime number of buckets,  
but I think using the same hash function for server selection and for  
internal server data structures is likely to lead to weird, hard to  
diagnose inefficiencies.

The bigger question: Are you actually seeing a server that's  
bottlenecked on the cost of computing the key hash? In our tests,  
hashing was so cheap that I didn't even bother moving the hash  
computation outside the mutex-protected part of the code. It is  
lightning fast because the entire key tends to fit in the CPU's L1  
cache, so you pay basically nothing to scan it and do the simple hash  
computation memcached uses internally.

If you're seeing lots of lock contention in threaded mode and hashing  
looks like a culprit, I'd try moving the hash computation up into the  
non-lock-protected code before I'd go messing with the protocol and  
breaking compatibility with all existing clients just to put that  
miniscule piece of work on the client's plate.

-Steve