Patch: Dynamic hashtable sizing

Mon Aug 21 20:27:50 UTC 2006

On Sun, 20 Aug 2006, Steven Grimm wrote:

> Brad Fitzpatrick wrote:
> > Regarding the 'gotcha', though:  How hard would an incremental copy be,
> > copying only 'n' buckets per event loop?  You'd have to have a
> > 'currently_at' position kept, and any changes made before that point while
> > you're still copying would need to be mirrored in the still-growing-slowly
> > larger copy.  Then when you're all the way caught up, remove the old one,
> > and zero out the 'currently_at' position, indicating there's no copy in
> > progress.
> >
>
> That could work. If someone else is willing to be a Guinea pig for this
> change, though, I'd like some feedback on whether the rehashing is
> actually a problem. That's a fair bit of added complexity and it's only
> worth doing if it actually solves a real problem. (But see below for a
> counterproposal.)

It's the only solution I'm really excited about.  For stress testing, we
could add a mode (runtime or otherwise), the resizes the hashtable
non-stop, always bouncing between random (power-of-two) sizes, whenever
it's finished with the previous resize, even going down smaller.  Then we
run it with load in that mode and any problems will quickly shake out.

I'm going to build a better test suite for memcached, both correctness and
stress tester, along with a sort of check-out-and-build-and-test tinderbox
for a bunch of Xen instances of different OSes, so I'm not worried about
the additional complexity not getting testing... we'll stress the hell out
of it and LJ could even run it in production on a few nodes as well.  And
if it's off by default, that's even more peace of mind.

> > Or, even simpler:  no auto-resizing and just make the initial hashtable
> > size a function of the max memory parameter.  A lot of data structures in
> > the Linux kernel work like that.
> >
>
> The average size of a data item can vary dramatically between memcached
> instances if you're using separate pools for different kinds of data,

True.

> How about a third option, sort of: a tuning setting to set the initial
> bucket count, with the existing (blocking) resize logic (switchable via
> a command-line setting) and with the default size based on memory size.
> Then someone who is extremely pause-sensitive can crank up the initial
> bucket count high enough to cover their expected cache contents and turn
> off the resizing.

Eh... I hate options.  They clutter code and docs and confuse users.  I
like things that automagically work.  :)

> That seems like it'd get the job done without the added complexity of
> dealing with requests against a halfway-rehashed cache. What do you think?

I vote automagic complexity with testing.  :)  I don't think it's actually
too hard?

- Brad