Patch: Dynamic hashtable sizing

Mon Aug 21 06:23:27 UTC 2006

Brad Fitzpatrick wrote:
> Regarding the 'gotcha', though:  How hard would an incremental copy be,
> copying only 'n' buckets per event loop?  You'd have to have a
> 'currently_at' position kept, and any changes made before that point while
> you're still copying would need to be mirrored in the still-growing-slowly
> larger copy.  Then when you're all the way caught up, remove the old one,
> and zero out the 'currently_at' position, indicating there's no copy in
> progress.
>   

That could work. If someone else is willing to be a Guinea pig for this 
change, though, I'd like some feedback on whether the rehashing is 
actually a problem. That's a fair bit of added complexity and it's only 
worth doing if it actually solves a real problem. (But see below for a 
counterproposal.)

> Or, even simpler:  no auto-resizing and just make the initial hashtable
> size a function of the max memory parameter.  A lot of data structures in
> the Linux kernel work like that.
>   

The average size of a data item can vary dramatically between memcached 
instances if you're using separate pools for different kinds of data, 
which means the optimal number of hash buckets isn't simply a function 
of max memory usage. In one of our pools, each data item is exactly 4 
bytes; in another, the average size is several kilobytes. So at the very 
least there needs to be some kind of tuning parameter to control the 
expected average value size.

With dynamic resizing, we can assign any memcached instance to any pool 
as needed and it just works. (In fact, we're working on a failover 
system that assigns a hot-spare memcached instance to replace a failed 
node automatically.) With static sizing, even with a tuning parameter, 
we have to go mess with the settings on each memcached instance based on 
which pool we expect to assign it to.

> I'm just afraid this gotcha would result in unnecessary pain on the
> mailing list.  And if people are doing aggressive monitoring, alerts
> going off, and clients with small timeouts, rehashing unnecessarily.
> Without knowing their CPU/memory speed, we can't even reliably say what
> the delay would be.
>
> So I'd like to go with either an incremental or dynamically-tuned-at-init
> approach.  Preference?
>   

How about a third option, sort of: a tuning setting to set the initial 
bucket count, with the existing (blocking) resize logic (switchable via 
a command-line setting) and with the default size based on memory size. 
Then someone who is extremely pause-sensitive can crank up the initial 
bucket count high enough to cover their expected cache contents and turn 
off the resizing.

That seems like it'd get the job done without the added complexity of 
dealing with requests against a halfway-rehashed cache. What do you think?

-Steve