memory layout idea: the bulldozer

Brad Fitzpatrick
Thu, 8 Apr 2004 15:29:54 -0700 (PDT)

Avva and I have been busy talking about new memory layout strategies that
would be better inside memcached.  The current approach is super paranoid,
is easy, and uses very minimal CPU, but has two annoying properties:

   -- ~35% memory is wasted

   -- it requires slab page reassignment as usage patterns change,
      especially for smaller memcached instances

So we've been talking about LRU'ing slabs instead of items (see previous
mail), but has some annoying properties as well.  Active items with
sibling inactive items can hog slabs.  So then we're talking about doing
both item and slab LRUs and always deciding to either free the oldest item
in the right size class, or free an entire slab (of any class), depending
on what's older.  That helps a little, but not much.  There are then
strategies to move all the items of like-size and like-popularity onto the
same slabs, so slabs are uniformly popular or boring and the boring ones
die off quickly.  A coworker of mine pointed me at the "Train GC
algorithm" for more on that, but I haven't read it yet.

In the end, I'm really thinking a slab allocator isn't what we need.

I've been thinking about the following model, which I've been explaining
to people with a bulldozer analogy:

-- You have a bunch of fairly large (~1MB) contig regions that form
   one huge logical region.  (the small 1MB chunks are just an
   implementation detail to make sure the program is able to get
   2GB or so of memory total, which it might not be able to do in
   one chunk?  maybe so.)

-- Starting from the left, you pack items in as tight as possible.
   (pointer aligned, basically.  no free space between items)

-- The point at which items are being added to this region is
   the bulldozer.

-- The bulldozer point wraps around.  Once it gets to the right,
   it starts over.

-- If any item's in the way, they're freed.  (this will require
   an index of the left side of all items...)

-- If any item is within range of the bulldozer's horn when a GET
   is done on it, it's relocated behind the bulldozer and the
   bulldozer advances.  (the horn might be 25% or 50% of the total

-- So the only things that are bulldozed are things that were not
   accessed and moved before the bulldozer got to them.

-- item updates can happen in place, if they're the same size or
   smaller.  (like increments).  or they can just move immediately.
   probably wouldn't matter much.

Key advantages to this scheme:

-- pretty damn simple

-- no size classes.  no tuning.  no page reassignment.

-- global item LRU, kinda.  rough.  should work?

-- memory is packed pretty tight, and constantly being
   repacked by the bulldozer.

Optional additions:

-- keep index of free regions and use those instead of the bulldozer.
   that is, when something is deleted or GET'ed and found too old,
   we keep track of the hole in the compacted memory and reuse it later.
   then do some least-fit algorithm, finding the smallest free region
   new stores can go into.  failing that, move the bulldozer and put it
   behind that.

   problem with that:  the free regions will need to be bulldozed off
   the index too.  which means:

     index to used items
     index from free size to free region
     index from free region to free size (to the previous index)

   we also probably want to bulldoze free regions a little bit further
   than normal items, since it'd be dumb to be putting new items just
   where they'd be soon to be bulldozed.

   so the whole free region thing may be more trouble than it's worth.