Theoretical set assoc cache org performance boosts?
Matt.Ingenthron at Sun.COM
Fri Dec 28 08:28:02 UTC 2007
Page alignment for cache efficiency was actually one of the things I was hoping to gain out of (experimentally, of course) graphing the libumem slab allocator to memcache. I started this just before the hackathon we hosted at Sun, but found it taking a bit longer than I'd thought and it wasn't high priority for anyone.
Of course, if memcached gets into areas where locality or SMP scaling becomes more critical, I think it could be a bigger issue-- but I otherwise agree with Steve that with the way most deployments are done today, you probably wouldn't see a big difference from this alone.
1. libumem is the userspace implementation of Solaris's kernel slab allocator. This is also where memcached's slab allocator design came from (Jeff Bonwick's usenix paper I think). Wez Furlong and some other folks at omniti.com liked it so much, they ported it to other OSs: https://labs.omniti.com/trac/portableumem
Also worth reading: http://blogs.sun.com/bonwick/entry/now_it_can_be_told
----- Original Message -----
From: Brian P Brooks <Brian.Brooks at Colorado.EDU>
Date: Thursday, December 27, 2007 10:37 pm
Subject: Re: Theoretical set assoc cache org performance boosts?
To: memcached <memcached at lists.danga.com>
> Is it realistically possible for a small server project like memcached
> to ever pose the major bottleneck as the server's processing time
> rather than connections/networking/protocol? Is this even possible
> for a lightweight server? Could the binary protocol offer such a
> performance boost where server processing could be a competitor for
> major profiling bottlenecks?
> Of course this is all in curiosity disregarding network speeds, API
> speeds, etc...
> Brian Brooks
> Cell: (303)319-8663
> ---- Original message ----
> >Date: Thu, 27 Dec 2007 22:18:47 -0800
> >From: Steven Grimm <sgrimm at facebook.com>
> >Subject: Re: Theoretical set assoc cache org performance boosts?
> >To: Brian P Brooks <Brian.Brooks at Colorado.EDU>
> >Cc: memcached <memcached at lists.danga.com>
> >I doubt that would make the network round-trips any faster, and
> >network delay is, to be very conservative about it, four or five
> >orders of magnitude greater than the total request processing time
> >inside the server. You could reduce the server's processing time to
> >zero and it would have no measurable effect on response times or
> >throughput from the client's point of view.
> >Of course, it's open source and you're welcome to experiment; nobody
> >would say no to a significant performance improvement. But I really
> >doubt there's much to be gained there.
> >On Dec 27, 2007, at 9:59 PM, Brian P Brooks wrote:
> >> To my understanding, at the server level, Memcached is implemented
> >> by a fully associative cache -- most likely using a LRU stack for
> >> overwriting comparisons. Would it be theoretically beneficial if
> >> Memcached were to use a 2 or 4 way set associative cache? Of
> >> there would be some changes i.e. would have to statically alloc
> >> so it could partition it's blocks.
> >> But, this would definitely help for apps that cache for speed
> >> than cache hit reliability.
> >> The only way I could see implementing any sort of direct mapped /
> >> set associative cache organization other than specifying the
> >> partitions you write to in your application (ie spec'ing out the
> >> direct mapped design in your application). Although, I could see
> >> how this could give you more control of the cache, and could
> >> probably result in faster caching performance (both reads and
> >> writes), but lower hit rates.
> >> Any thoughts?
> >> Brian Brooks
> >> http://csel.cs.colorado.edu/~brooksbp/
> >> Cell: (303)319-8663
More information about the memcached