Memory management

Mon Mar 26 18:25:50 UTC 2007

Okay, that's reasonable. Not what I normally think of as fragmentation 
but I suppose it's fair to classify that way.

You are quite correct that memcached wastes memory if the sizes of 
objects (plus header sizes) don't precisely match the item sizes. This 
falls into two categories. The first, most of what you'll find at the 
low end of the scale, is due to word alignment. On a 64-bit system, the 
start of each object must be 8-byte aligned since the fixed-size header 
contains pointers. So if, for example, you have a 41-byte object, you 
will always waste 7 bytes. There's pretty much no getting around that 
without killing performance -- you don't want to be doing 
non-word-aligned memory accesses, at least on most of the modern CPU 
architectures I'm aware of.

The second category is what you described, where the increment between 
slab item sizes causes memcached to skip over the optimal (word-aligned) 
size for some group of objects. That absolutely happens. It is much 
better now than it used to be; originally memcached used a simple 
powers-of-two sizing strategy for items (each item is called a "slab 
class" internally), but now it uses a powers-of-n strategy where 1 < n. 
You can tune n on the command line; if you set it extremely low, you can 
get up to 199 different slab classes, which should give you very little 
wasted memory. You can also tune the starting item size, useful if you 
have fixed-size items.

So why isn't it just set low all the time? Because -- and this is the 
problem with creating new slab classes on the fly -- having tons of slab 
classes causes another sort of fragmentation: it fragments the LRU 
queue, which operates on a per-slab-class basis. If you have 199 slab 
classes, you have 199 separate LRU queues, which can cause undesirable 
behavior from the application's point of view. Even with the existing 
default class sizes we occasionally notice cases where some objects live 
in the cache for weeks and others for days; the more finely you break 
your sizes down, the more frequent that kind of behavior will be. And 
when you're caching less-frequently-used objects at the expense of 
more-frequently-used ones, that is every bit as much a waste of memory 
as you get from size mismatches.

Also, it's not actually that bad at the moment. Of course, this will 
vary depending on what's in your particular cache, but in our cache at 
least, there are vastly more small objects than large ones. And at the 
low end of the size scale (again, depending on what you set the sizing 
scale factor to), the unavoidable loss from byte alignment is actually 
the dominant cause of wasted bytes.

We've talked internally about eventually moving away from the 
slab-class-based system. It should be possible to run a job in the 
background (especially in the multithreaded version, but even in the 
single-threaded one) to shuffle objects around in memory to pack them 
optimally. If you separate the fixed-size headers from the item data, 
you can even do away with byte-alignment-based waste. That would give 
you 100% memory efficiency for any arbitrary mix of item sizes, and true 
LRU behavior across the whole cache, at the cost of greater CPU consumption.

In my opinion that's a direction that's far more worthy of exploration 
than fiddling further with the slab allocator's sizing strategy.

-Steve

Jure Petrovic wrote:
> Oh, just theoretical discussion for my thesis. I've read Bonwick paper
> referenced in Brad's "Distributed caching with Memcached". It is said
> that resizing slab classes could sometimes reduce internal
> fragmentation.
>
> So, for example let's say we allocate 700 bytes for 64-byte slab class
> and all of the objects stored have 40 bytes. This means we can store
> 17 objects and leave 20 bytes unused. (700bytes = 17objects * 40bytes +
> 20bytes) 
>
> In such cases (if I get that right) Bonwick suggests re-allocating that 
> 700 bytes in such manner, we wouldn't get that unused 20 bytes. That
> means allocating either 680 or 720 bytes (17 or 18 objects).
>
> Please check attached schematics in pdf to see if I got that right. And
> you say that allocation in memcached is fixed for each particular
> slab-class?
>
>
> On Sun, 2007-03-25 at 11:30 -0700, Steven Grimm wrote:
>   
>> Slab sizes -- or rather, the sizes of the items in each particular class 
>> of slabs -- are fixed at startup. Can you give an example of the kind of 
>> fragmentation you're seeing (or expecting to see)?
>>
>> -Steve