Patch: Dynamic hashtable sizing
sgrimm at facebook.com
Mon Aug 21 06:23:27 UTC 2006
Brad Fitzpatrick wrote:
> Regarding the 'gotcha', though: How hard would an incremental copy be,
> copying only 'n' buckets per event loop? You'd have to have a
> 'currently_at' position kept, and any changes made before that point while
> you're still copying would need to be mirrored in the still-growing-slowly
> larger copy. Then when you're all the way caught up, remove the old one,
> and zero out the 'currently_at' position, indicating there's no copy in
That could work. If someone else is willing to be a Guinea pig for this
change, though, I'd like some feedback on whether the rehashing is
actually a problem. That's a fair bit of added complexity and it's only
worth doing if it actually solves a real problem. (But see below for a
> Or, even simpler: no auto-resizing and just make the initial hashtable
> size a function of the max memory parameter. A lot of data structures in
> the Linux kernel work like that.
The average size of a data item can vary dramatically between memcached
instances if you're using separate pools for different kinds of data,
which means the optimal number of hash buckets isn't simply a function
of max memory usage. In one of our pools, each data item is exactly 4
bytes; in another, the average size is several kilobytes. So at the very
least there needs to be some kind of tuning parameter to control the
expected average value size.
With dynamic resizing, we can assign any memcached instance to any pool
as needed and it just works. (In fact, we're working on a failover
system that assigns a hot-spare memcached instance to replace a failed
node automatically.) With static sizing, even with a tuning parameter,
we have to go mess with the settings on each memcached instance based on
which pool we expect to assign it to.
> I'm just afraid this gotcha would result in unnecessary pain on the
> mailing list. And if people are doing aggressive monitoring, alerts
> going off, and clients with small timeouts, rehashing unnecessarily.
> Without knowing their CPU/memory speed, we can't even reliably say what
> the delay would be.
> So I'd like to go with either an incremental or dynamically-tuned-at-init
> approach. Preference?
How about a third option, sort of: a tuning setting to set the initial
bucket count, with the existing (blocking) resize logic (switchable via
a command-line setting) and with the default size based on memory size.
Then someone who is extremely pause-sensitive can crank up the initial
bucket count high enough to cover their expected cache contents and turn
off the resizing.
That seems like it'd get the job done without the added complexity of
dealing with requests against a halfway-rehashed cache. What do you think?
More information about the memcached