Multithreading (was Re: first time user with out of memory question)
Steven Grimm
sgrimm at facebook.com
Mon Jun 12 19:56:19 UTC 2006
Ivan Krstic wrote:
> Memcached has a pretty great track record when it comes to bugs. The
> significant speed decrease and complexity increase that would result
> from threading is absolutely not worth the help that automated
> debugging tools would give you on such a small codebase.
>
We have at various times considered making memcached multithreaded. The
big advantage would be when running it on a large multiprocessor box;
with the current single-threaded architecture, if you're CPU-bound, you
have to run multiple instances on a multiprocessor server. That works
fine, but it means less batching of keys in "get" requests. We've never
actually started in on it; the likelihood of introducing new bugs into a
piece of software that runs so smoothly hasn't seemed worth it.
However, I can share a few of the design thoughts we've had, in case
someone else wants to have at it at some point.
First of all, you can make good use of a small number of processors
(2-4) without actually changing all that much of the structure of the
code. For example, a two-threaded version of memcached could use one
thread to handle network I/O and another thread to parse and execute
incoming requests. The advantage of a very simple setup like that is
that each thread enjoys exclusive access to most of the data structures
it needs; thus there is a minimal amount of time spent blocking on locks
(near zero if you did it right, I'd think) and very little chance of
obscure deadlocks or race conditions.
Taking that a step further, you could separate the network code into
input and output threads (though those two will necessarily have to
share more data) and separate "get" requests from writes (ditto).
Going beyond 2-4 processors is where you start having to make so many
changes that it almost becomes a rewrite. I would probably not go with a
traditional worker-thread pool model since memcached requests tend to be
extremely short-lived and the overhead of managing the thread pool would
probably be higher than the time spent servicing a typical request. A
thread-per-connection model makes a bit more sense, but you'd end up
wasting a lot of memory on threads that are mostly just sitting around
doing nothing.
But all in all, my guess is people will continue to reach the same
conclusion that we have: it's just not worth the extra complexity and
risk of bugs for a relatively small gain.
-Steve
More information about the memcached
mailing list