benchmarking Perl client hash functions
Brad Fitzpatrick
brad@danga.com
Sat, 17 Jul 2004 16:59:44 -0700 (PDT)
On Fri, 16 Jul 2004, Larry Leszczynski wrote:
>
> 1) For test keys I used all the words in my /usr/share/dict/words, plus
> each of the same words with the characters reversed (about 90,000 keys
> total, average length 8 characters). Is there any reason to think this
> might skew the results, either because of the length or because the test
> keys only contain alpha characters? If so, any ideas for generating more
> realistic test keys?
In practice, real keys end up looking like:
foo:34
foo:234
foo:23
foo:213234
bar:234234
bar:23
bar:8289
That is, lots of common prefixes. I suppose I could make a dump of our
keys in use as a better testing starting point, but I have no tool for
that.
> 2) I haven't done any profiling of the Perl client code so I don't know
> how much of a bottleneck (if any) the hashing part might be. I can
> probably rig up a patch that uses String::CRC32 if it exists, or falls
> back to the current algorithm if not - should I go ahead and give that a
> try?
That could be disasterous if one client has the module and another
doesn't. Sets would go one place, deletes another... would get ugly, not
to mention low hit rates.
I think if we switch, the thing we switch to should be required, or the
caller must explicitly choose their hashing scheme (when they provide more
than one server?).
- Brad