sgrimm at facebook.com
Mon Mar 26 18:25:50 UTC 2007
Okay, that's reasonable. Not what I normally think of as fragmentation
but I suppose it's fair to classify that way.
You are quite correct that memcached wastes memory if the sizes of
objects (plus header sizes) don't precisely match the item sizes. This
falls into two categories. The first, most of what you'll find at the
low end of the scale, is due to word alignment. On a 64-bit system, the
start of each object must be 8-byte aligned since the fixed-size header
contains pointers. So if, for example, you have a 41-byte object, you
will always waste 7 bytes. There's pretty much no getting around that
without killing performance -- you don't want to be doing
non-word-aligned memory accesses, at least on most of the modern CPU
architectures I'm aware of.
The second category is what you described, where the increment between
slab item sizes causes memcached to skip over the optimal (word-aligned)
size for some group of objects. That absolutely happens. It is much
better now than it used to be; originally memcached used a simple
powers-of-two sizing strategy for items (each item is called a "slab
class" internally), but now it uses a powers-of-n strategy where 1 < n.
You can tune n on the command line; if you set it extremely low, you can
get up to 199 different slab classes, which should give you very little
wasted memory. You can also tune the starting item size, useful if you
have fixed-size items.
So why isn't it just set low all the time? Because -- and this is the
problem with creating new slab classes on the fly -- having tons of slab
classes causes another sort of fragmentation: it fragments the LRU
queue, which operates on a per-slab-class basis. If you have 199 slab
classes, you have 199 separate LRU queues, which can cause undesirable
behavior from the application's point of view. Even with the existing
default class sizes we occasionally notice cases where some objects live
in the cache for weeks and others for days; the more finely you break
your sizes down, the more frequent that kind of behavior will be. And
when you're caching less-frequently-used objects at the expense of
more-frequently-used ones, that is every bit as much a waste of memory
as you get from size mismatches.
Also, it's not actually that bad at the moment. Of course, this will
vary depending on what's in your particular cache, but in our cache at
least, there are vastly more small objects than large ones. And at the
low end of the size scale (again, depending on what you set the sizing
scale factor to), the unavoidable loss from byte alignment is actually
the dominant cause of wasted bytes.
We've talked internally about eventually moving away from the
slab-class-based system. It should be possible to run a job in the
background (especially in the multithreaded version, but even in the
single-threaded one) to shuffle objects around in memory to pack them
optimally. If you separate the fixed-size headers from the item data,
you can even do away with byte-alignment-based waste. That would give
you 100% memory efficiency for any arbitrary mix of item sizes, and true
LRU behavior across the whole cache, at the cost of greater CPU consumption.
In my opinion that's a direction that's far more worthy of exploration
than fiddling further with the slab allocator's sizing strategy.
Jure Petrovic wrote:
> Oh, just theoretical discussion for my thesis. I've read Bonwick paper
> referenced in Brad's "Distributed caching with Memcached". It is said
> that resizing slab classes could sometimes reduce internal
> So, for example let's say we allocate 700 bytes for 64-byte slab class
> and all of the objects stored have 40 bytes. This means we can store
> 17 objects and leave 20 bytes unused. (700bytes = 17objects * 40bytes +
> In such cases (if I get that right) Bonwick suggests re-allocating that
> 700 bytes in such manner, we wouldn't get that unused 20 bytes. That
> means allocating either 680 or 720 bytes (17 or 18 objects).
> Please check attached schematics in pdf to see if I got that right. And
> you say that allocation in memcached is fixed for each particular
> On Sun, 2007-03-25 at 11:30 -0700, Steven Grimm wrote:
>> Slab sizes -- or rather, the sizes of the items in each particular class
>> of slabs -- are fixed at startup. Can you give an example of the kind of
>> fragmentation you're seeing (or expecting to see)?
More information about the memcached