Structure of the memcached KEY

Steven Grimm sgrimm at facebook.com
Fri Nov 3 16:05:33 UTC 2006


If you use a good hash function like SHA-1, false collisions are 
basically a nonissue. You're more likely to be killed by a falling 
airplane while holding a winning lottery ticket than to ever see an 
SHA-1 collision between valid SQL queries. Distributed version control 
systems like git and monotone use hashes of file contents to uniquely 
identify revisions of a source tree; to my knowledge nobody has ever 
come across a collision, despite the fact that they're used on 
frequently-modified projects like the Linux kernel.

Even a fast 64-bit hash like Jenkins2006 (which you'll find in the 
memcached Subversion repository, BTW) will essentially never give you a 
collision on two different pieces of text as long as you use the full 
hash value. Of course, you can try it easily enough: write out a log of 
all your SQL queries for a month and run them all through different hash 
functions to see if any of them collide. If you find collisions, it will 
probably be of no small interest to encryption researchers, since the 
likelihood of collision between two similar pieces of text is pretty 
much the #1 criterion people use to rate the quality of a hash function.

-Steve


Randy Wigginton wrote:
> Just FYI, in our application we use the full SQL query to generate a 
> UUID.  We are far more concerned about false collisions than about the 
> overhead, which is tiny (and once the sql is transformed on the client 
> side, we save the result in a local hashtable, thus only having to 
> generate the UUID once).
>



More information about the memcached mailing list