benchmarking Perl client hash functions

Brad Fitzpatrick brad@danga.com
Sat, 17 Jul 2004 16:59:44 -0700 (PDT)


On Fri, 16 Jul 2004, Larry Leszczynski wrote:
>
> 1) For test keys I used all the words in my /usr/share/dict/words, plus
> each of the same words with the characters reversed (about 90,000 keys
> total, average length 8 characters).  Is there any reason to think this
> might skew the results, either because of the length or because the test
> keys only contain alpha characters?  If so, any ideas for generating more
> realistic test keys?

In practice, real keys end up looking like:

   foo:34
   foo:234
   foo:23
   foo:213234
   bar:234234
   bar:23
   bar:8289

That is, lots of common prefixes.  I suppose I could make a dump of our
keys in use as a better testing starting point, but I have no tool for
that.

> 2) I haven't done any profiling of the Perl client code so I don't know
> how much of a bottleneck (if any) the hashing part might be.  I can
> probably rig up a patch that uses String::CRC32 if it exists, or falls
> back to the current algorithm if not - should I go ahead and give that a
> try?

That could be disasterous if one client has the module and another
doesn't.  Sets would go one place, deletes another... would get ugly, not
to mention low hit rates.

I think if we switch, the thing we switch to should be required, or the
caller must explicitly choose their hashing scheme (when they provide more
than one server?).

- Brad