Structure of the memcached KEY
Steven Grimm
sgrimm at facebook.com
Fri Nov 3 16:05:33 UTC 2006
If you use a good hash function like SHA-1, false collisions are
basically a nonissue. You're more likely to be killed by a falling
airplane while holding a winning lottery ticket than to ever see an
SHA-1 collision between valid SQL queries. Distributed version control
systems like git and monotone use hashes of file contents to uniquely
identify revisions of a source tree; to my knowledge nobody has ever
come across a collision, despite the fact that they're used on
frequently-modified projects like the Linux kernel.
Even a fast 64-bit hash like Jenkins2006 (which you'll find in the
memcached Subversion repository, BTW) will essentially never give you a
collision on two different pieces of text as long as you use the full
hash value. Of course, you can try it easily enough: write out a log of
all your SQL queries for a month and run them all through different hash
functions to see if any of them collide. If you find collisions, it will
probably be of no small interest to encryption researchers, since the
likelihood of collision between two similar pieces of text is pretty
much the #1 criterion people use to rate the quality of a hash function.
-Steve
Randy Wigginton wrote:
> Just FYI, in our application we use the full SQL query to generate a
> UUID. We are far more concerned about false collisions than about the
> overhead, which is tiny (and once the sql is transformed on the client
> side, we save the result in a local hashtable, thus only having to
> generate the UUID once).
>
More information about the memcached
mailing list