Theoretical set assoc cache org performance boosts?

Fri Dec 28 08:28:02 UTC 2007

Page alignment for cache efficiency was actually one of the things I was hoping to gain out of (experimentally, of course) graphing the libumem[1] slab allocator to memcache.  I started this just before the hackathon we hosted at Sun, but found it taking a bit longer than I'd thought and it wasn't high priority for anyone.

Of course, if memcached gets into areas where locality or SMP scaling becomes more critical, I think it could be a bigger issue-- but I otherwise agree with Steve that with the way most deployments are done today, you probably wouldn't see a big difference from this alone.

- Matt

1. libumem is the userspace implementation of Solaris's kernel slab allocator.  This is also where memcached's slab allocator design came from (Jeff Bonwick's usenix paper I think).  Wez Furlong and some other folks at omniti.com liked it so much, they ported it to other OSs: https://labs.omniti.com/trac/portableumem

Also worth reading: http://blogs.sun.com/bonwick/entry/now_it_can_be_told

----- Original Message -----
From: Brian P Brooks <Brian.Brooks at Colorado.EDU>
Date: Thursday, December 27, 2007 10:37 pm
Subject: Re: Theoretical set assoc cache org performance boosts?
To: memcached <memcached at lists.danga.com>

> Is it realistically possible for a small server project like memcached 
> to ever pose the major bottleneck as the server's processing time 
> rather than connections/networking/protocol?  Is this even possible 
> for a lightweight server?  Could the binary protocol offer such a 
> performance boost where server processing could be a competitor for 
> major profiling bottlenecks?
>  
>  Of course this is all in curiosity disregarding network speeds, API 
> speeds, etc...
>  
>  Brian Brooks
>  http://csel.cs.colorado.edu/~brooksbp/
>  Cell: (303)319-8663
>  
>  
>  ---- Original message ----
>  >Date: Thu, 27 Dec 2007 22:18:47 -0800
>  >From: Steven Grimm <sgrimm at facebook.com>  
>  >Subject: Re: Theoretical set assoc cache org performance boosts?  
>  >To: Brian P Brooks <Brian.Brooks at Colorado.EDU>
>  >Cc: memcached <memcached at lists.danga.com>
>  >
>  >I doubt that would make the network round-trips any faster, and  
>  >network delay is, to be very conservative about it, four or five  
>  >orders of magnitude greater than the total request processing time  
> 
>  >inside the server. You could reduce the server's processing time to  
> 
>  >zero and it would have no measurable effect on response times or  
>  >throughput from the client's point of view.
>  >
>  >Of course, it's open source and you're welcome to experiment; nobody 
>  
>  >would say no to a significant performance improvement. But I really  
> 
>  >doubt there's much to be gained there.
>  >
>  >-Steve
>  >
>  >
>  >On Dec 27, 2007, at 9:59 PM, Brian P Brooks wrote:
>  >
>  >> To my understanding, at the server level, Memcached is implemented 
>  
>  >> by a fully associative cache -- most likely using a LRU stack for  
> 
>  >> overwriting comparisons.  Would it be theoretically beneficial if  
> 
>  >> Memcached were to use a 2 or 4 way set associative cache?  Of 
> course  
>  >> there would be some changes i.e. would have to statically alloc 
> RAM  
>  >> so it could partition it's blocks.
>  >>
>  >> But, this would definitely help for apps that cache for speed 
> rather  
>  >> than cache hit reliability.
>  >>
>  >> The only way I could see implementing any sort of direct mapped /  
> 
>  >> set associative cache organization other than specifying the 
> blocks/ 
>  >> partitions you write to in your application (ie spec'ing out the  
> 
>  >> direct mapped design in your application).  Although, I could see  
> 
>  >> how this could give you more control of the cache, and could  
>  >> probably result in faster caching performance (both reads and  
>  >> writes), but lower hit rates.
>  >>
>  >> Any thoughts?
>  >>
>  >> Brian Brooks
>  >> http://csel.cs.colorado.edu/~brooksbp/
>  >> Cell: (303)319-8663
>  >
>