memory layout idea: the bulldozer

Joachim Bauernberger joachim.bauernberger@friendscout24.de
Fri, 9 Apr 2004 10:33:02 +0200


Hi,

the following might be completely not interesting or whack :-)

I have been playing around with linux kernel tmpfs option once, which allow=
ed=20
me to use write/read system calls to write stuff to memory.=20
That way there is neither malloc nor disk access, etc ... to worry about.

I am no wizard when it comes to understanding the difference in Memory=20
Management on the OS level, so please be gentle in case this approach would=
=20
suck :-)

http://www-106.ibm.com/developerworks/library/l-fs3.html

Regards,
~/joachim



On Friday 09 April 2004 00:29, Brad Fitzpatrick wrote:
> Avva and I have been busy talking about new memory layout strategies that
> would be better inside memcached.  The current approach is super paranoid,
> is easy, and uses very minimal CPU, but has two annoying properties:
>
>    -- ~35% memory is wasted
>
>    -- it requires slab page reassignment as usage patterns change,
>       especially for smaller memcached instances
>
> So we've been talking about LRU'ing slabs instead of items (see previous
> mail), but has some annoying properties as well.  Active items with
> sibling inactive items can hog slabs.  So then we're talking about doing
> both item and slab LRUs and always deciding to either free the oldest item
> in the right size class, or free an entire slab (of any class), depending
> on what's older.  That helps a little, but not much.  There are then
> strategies to move all the items of like-size and like-popularity onto the
> same slabs, so slabs are uniformly popular or boring and the boring ones
> die off quickly.  A coworker of mine pointed me at the "Train GC
> algorithm" for more on that, but I haven't read it yet.
>
> In the end, I'm really thinking a slab allocator isn't what we need.
>
> I've been thinking about the following model, which I've been explaining
> to people with a bulldozer analogy:
>
> -- You have a bunch of fairly large (~1MB) contig regions that form
>    one huge logical region.  (the small 1MB chunks are just an
>    implementation detail to make sure the program is able to get
>    2GB or so of memory total, which it might not be able to do in
>    one chunk?  maybe so.)
>
> -- Starting from the left, you pack items in as tight as possible.
>    (pointer aligned, basically.  no free space between items)
>
> -- The point at which items are being added to this region is
>    the bulldozer.
>
> -- The bulldozer point wraps around.  Once it gets to the right,
>    it starts over.
>
> -- If any item's in the way, they're freed.  (this will require
>    an index of the left side of all items...)
>
> -- If any item is within range of the bulldozer's horn when a GET
>    is done on it, it's relocated behind the bulldozer and the
>    bulldozer advances.  (the horn might be 25% or 50% of the total
>    memory?)
>
> -- So the only things that are bulldozed are things that were not
>    accessed and moved before the bulldozer got to them.
>
> -- item updates can happen in place, if they're the same size or
>    smaller.  (like increments).  or they can just move immediately.
>    probably wouldn't matter much.
>
> Key advantages to this scheme:
>
> -- pretty damn simple
>
> -- no size classes.  no tuning.  no page reassignment.
>
> -- global item LRU, kinda.  rough.  should work?
>
> -- memory is packed pretty tight, and constantly being
>    repacked by the bulldozer.
>
> Optional additions:
>
> -- keep index of free regions and use those instead of the bulldozer.
>    that is, when something is deleted or GET'ed and found too old,
>    we keep track of the hole in the compacted memory and reuse it later.
>    then do some least-fit algorithm, finding the smallest free region
>    new stores can go into.  failing that, move the bulldozer and put it
>    behind that.
>
>    problem with that:  the free regions will need to be bulldozed off
>    the index too.  which means:
>
>      index to used items
>      index from free size to free region
>      index from free region to free size (to the previous index)
>
>    we also probably want to bulldoze free regions a little bit further
>    than normal items, since it'd be dumb to be putting new items just
>    where they'd be soon to be bulldozed.
>
>    so the whole free region thing may be more trouble than it's worth.
>
> Thoughts?

=2D-  =A0 =A0 =A0 =A0=20
Web: http://www.bauernberger.de