Patch: CPU efficiency, UDP support, and other changes
brad at danga.com
Thu May 4 21:54:03 UTC 2006
I'm actively working on reviewing/merging it. It's a bit of a pain as a
massive patch, but I'm doing my best.
On Thu, 4 May 2006, Cahill, Earl wrote:
> Any chance of the patch getting accepted? Would it be hard to redo the
> patch off of 1.1.13 once that releases? Based on your description, we
> would be quite interested in what you did.
> Now if you could just add some real namespace support . . . . Wouldn't
> facebook be interested in such a thing?
> > -----Original Message-----
> > From: memcached-bounces at lists.danga.com [mailto:memcached-
> > bounces at lists.danga.com] On Behalf Of Steven Grimm
> > Sent: Wednesday, May 03, 2006 10:07 AM
> > To: memcached at lists.danga.com
> > Subject: Patch: CPU efficiency, UDP support, and other changes
> > This big patch contains all the changes we've made to memcached 1.1.12
> > at facebook.com. It includes some changes I've sent to the list as
> > separate patches:
> > * Memory efficiency is increased; we get about 40% more items in a
> > amount of memory vs. the standard 1.1.12 memcached. (This patch has a
> > couple tweaks that aren't in the previous smaller one since they tie
> > into other changes.)
> > * Support for large memory sizes (64-bit pointers and size_t).
> > * Fix for bogus "out of memory" errors caused by memory filling up
> > before a slab class has any slabs.
> > But the big changes here are not in the other patches:
> > * CPU consumption is reduced 25-30%.
> > * A UDP-based interface is supported in addition to the standard TCP
> > No doubt some will ask, "Why are you sending this out as one big patch
> > instead of splitting everything out into small independent patches?"
> > I'll include a section answering just that question at the end of this
> > message.
> > Details follow. Some of this will look familiar if you've seen the
> > earlier patches.
> > Memory consumption
> > ------------------
> > The slab allocator's powers-of-2 size strategy is now a powers-of-N
> > strategy, where N may be specified on the command line using the new
> > "-f" option. The default is 1.25. For a large memcached instance,
> > there are enough items of enough different sizes that the increased
> > number of slab classes isn't itself a waste of memory, this is a
> > significant win: items are placed in chunks whose sizes are much
> > to the item size, wasting less memory.
> > One consequence of this is that slabs are no longer fixed-size; by
> > default they are no bigger than 1MB each, but are only as big as they
> > need to be to hold a whole number of chunks. That causes the "slabs
> > reassign" command to be unavailable; it can be reenabled by compiling
> > with -DALLOW_SLABS_REASSIGN at the expense of some wasted memory (all
> > slabs will be 1MB).
> > The minimum amount of space for data in chunks of the smallest slab
> > class may be adjusted on the command line using the new "-s" option.
> > Each chunk is that many bytes plus the size of the fixed-size item
> > header. If you have a lot of items with small fixed-size keys and
> > values, you can use this option to maximize the number of items per
> > in the smallest slab class; experiment with your particular data to
> > the optimal value.
> > Item expiration times and access times are now stored as 32-bit
> > (number of seconds relative to server start) rather than time_t, which
> > is 64 bits on some platforms. This saves 8 bytes per item when
> > in 64-bit mode, and is harmless otherwise.
> > CPU consumption
> > ---------------
> > The implementation of the "get" request is substantially reworked. Now
> > the entire response is composed in memory ahead of time, and we write
> > out in (usually) just one system call using sendmsg()'s scatter/gather
> > capability. Since we are no longer doing small writes, the
> > TCP_CORK/NOPUSH code is not needed and we can simply set the TCP
> > to TCP_NODELAY at connect time, saving a couple more system calls per
> > request.
> > The "VALUE" line (response to a "get" request) is rendered once at
> > creation time, rather than re-rendered on each fetch.
> > The current system time is stored in a global variable that's updated
> > every second by a libevent timer; this eliminates several time() calls
> > per request. A minor improvement, but a cycle saved is a cycle earned.
> > UDP support
> > -----------
> > For large installations with tens of thousands of clients, the amount
> > memory consumed by per-TCP-connection kernel buffers can grow large,
> > reducing the amount of memory that can be used by memcached. There is
> > now a UDP protocol, which supports an arbitrarily large number of
> > clients using a constant amount of server memory.
> > In the interest of efficiency and simplicity of implementation, the
> > protocol does not support reliable delivery; it should therefore be
> > for "get" requests where a dropped response would simply result in a
> > recoverable cache miss. For write requests (set, delete, etc.) or very
> > large "get" requests, a nonpersistent TCP connection should be used.
> > (This is simply advice; the code will happily accept any kind of
> > via its UDP interface.)
> > UDP support is only enabled if a UDP port is specified on the command
> > line.
> > The UDP protocol is described at the bottom of doc/protocol.txt.
> > Large memory support
> > --------------------
> > This mostly involves using size_t rather than unsigned int in a few
> > places and compiling in 64-bit mode, which gives us 64-bit pointers
> > makes size_t 64 bits.
> > Fix for "out of memory" errors
> > ------------------------------
> > Rather than preallocate a slab in each slab class as the memcached
> > 1.1.13 prerelease does, we decided to instead allow memcached to
> > its memory limit slightly. When a "set" request comes in that requires
> > slab whose slab class is empty, we always allocate a slab, even if
> > memcached is already at its configured memory limit.
> > Our memcached instances are large enough that going over the limit by
> > few megabytes is barely even detectable. If you are running in a very
> > constrained environment, you can lower the memory limit slightly to
> > account for this change, but bear in mind that this change will only
> > exceed the memory limit if a "set" request requires it (which will
> > happen if your data always falls within a limited range of sizes.)
> > Why is this one patch?
> > ----------------------
> > First, this patch is tested. It runs 24x7 on a large number of very
> > memcached hosts. Thoroughly testing every possible permutation of
> > changes isn't really feasible.
> > Second, the changes are not all easily separable. For example, adding
> > the UDP support required reorganizing memcached's implementation of
> > "get" request, and that reorganization also resulted in most of the
> > time improvement. Similarly, one of the memory efficiency tweaks is
> > only required because compiling in 64-bit mode (for large memory
> > support) increases the size of a particular data type, and the
> > implementation of that tweak results in part of the CPU time savings.
> > Third, I *did* send it out as separate patches to the extent it made
> > sense to separate out the changes. But rather than excluding those
> > changes from the not-easily-separable stuff, I think it makes more
> > to include it all together. Otherwise anyone who wants to combine
> > everything will have to do tedious error-prone manual editing to merge
> > it all together, since some of the changes conflict. For example, both
> > the large memory support and the slab allocator modification involve
> > changing the parameters to slabs_init(), so it would be impossible to
> > produce two independent patches against the 1.1.12 release that could
> > applied successfully one after the other.
> > Credits
> > -------
> > These changes were made by David Fetterman, Steven Grimm, and Scott
> > Marlette. Send comments to Steven Grimm (sgrimm at facebook.com) or,
> > preferably, to the memcached mailing list.
More information about the memcached