memory layout idea: the bulldozer
Thu, 8 Apr 2004 15:29:54 -0700 (PDT)
Avva and I have been busy talking about new memory layout strategies that
would be better inside memcached. The current approach is super paranoid,
is easy, and uses very minimal CPU, but has two annoying properties:
-- ~35% memory is wasted
-- it requires slab page reassignment as usage patterns change,
especially for smaller memcached instances
So we've been talking about LRU'ing slabs instead of items (see previous
mail), but has some annoying properties as well. Active items with
sibling inactive items can hog slabs. So then we're talking about doing
both item and slab LRUs and always deciding to either free the oldest item
in the right size class, or free an entire slab (of any class), depending
on what's older. That helps a little, but not much. There are then
strategies to move all the items of like-size and like-popularity onto the
same slabs, so slabs are uniformly popular or boring and the boring ones
die off quickly. A coworker of mine pointed me at the "Train GC
algorithm" for more on that, but I haven't read it yet.
In the end, I'm really thinking a slab allocator isn't what we need.
I've been thinking about the following model, which I've been explaining
to people with a bulldozer analogy:
-- You have a bunch of fairly large (~1MB) contig regions that form
one huge logical region. (the small 1MB chunks are just an
implementation detail to make sure the program is able to get
2GB or so of memory total, which it might not be able to do in
one chunk? maybe so.)
-- Starting from the left, you pack items in as tight as possible.
(pointer aligned, basically. no free space between items)
-- The point at which items are being added to this region is
-- The bulldozer point wraps around. Once it gets to the right,
it starts over.
-- If any item's in the way, they're freed. (this will require
an index of the left side of all items...)
-- If any item is within range of the bulldozer's horn when a GET
is done on it, it's relocated behind the bulldozer and the
bulldozer advances. (the horn might be 25% or 50% of the total
-- So the only things that are bulldozed are things that were not
accessed and moved before the bulldozer got to them.
-- item updates can happen in place, if they're the same size or
smaller. (like increments). or they can just move immediately.
probably wouldn't matter much.
Key advantages to this scheme:
-- pretty damn simple
-- no size classes. no tuning. no page reassignment.
-- global item LRU, kinda. rough. should work?
-- memory is packed pretty tight, and constantly being
repacked by the bulldozer.
-- keep index of free regions and use those instead of the bulldozer.
that is, when something is deleted or GET'ed and found too old,
we keep track of the hole in the compacted memory and reuse it later.
then do some least-fit algorithm, finding the smallest free region
new stores can go into. failing that, move the bulldozer and put it
problem with that: the free regions will need to be bulldozed off
the index too. which means:
index to used items
index from free size to free region
index from free region to free size (to the previous index)
we also probably want to bulldoze free regions a little bit further
than normal items, since it'd be dumb to be putting new items just
where they'd be soon to be bulldozed.
so the whole free region thing may be more trouble than it's worth.