Largest production memcached install?

Fri May 4 05:41:25 UTC 2007

Hi Steve,

A bit off topic. I can't help but wondering:

Your memcache nodes are nice and beefy boxes (32G RAM, 4 cores of 
probably at leat 2GHz each -- that's generally a good amount of power 
for a database), maybe they don't have any spindles at all, though, but, 
if they did have a few, say up to 4, disks in each;

And you would split (federate) your database into 100 chunks (the 
remaining 100 would be hot spares of the first 100 and could even be 
used to serve reads), wouldn't that take care of all your database load 
needs and pretty much eliminate the need for memcache? Wouldn't 50 such 
boxes be enough in reality?

I do realize that 200 machines with no hard drives cost both less to buy 
and maintain. But what about 50? (just throwing random numbers). In the 
past you've also said that some of your memcache nodes do 30-60k 
reqs/sec, which would be very high in db speak, but I assume that that's 
the exception rather than the rule because 6 to 12 million memcache 
reqs/sec in aggregate sounds a bit out of this world.

Thanks

Steve Grimm wrote:
> Our db load averages tend to range from 0.25 to 4.5 or so, depending 
> on which particular hosts you’re looking at. More of them at the lower 
> end of that range than the upper end.
>
> When we need to do more major surgery to our memcached configuration, 
> we do it at the lowest-usage time of day to minimize the impact on the 
> site. Our cache is partitioned into different sections so we can take 
> down part of it at a time (to upgrade to a new memcached build, say) 
> without losing the whole cache.
>
> We consider memcached a critical part of our infrastructure. The 
> benefit of memcached in a typical setup is to reduce the amount of 
> database hardware you need to support an application; if you have 
> enough database horsepower to run unimpaired with most of your 
> memcached servers out of service, then there’s probably no point using 
> memcached at all, since it without a doubt adds extra complexity to 
> your application code. But if you go that route you’ll probably spend 
> many times as much money and burden yourself with a great deal more 
> administrative hassle (DB servers typically being more expensive and 
> more work to keep running smoothly than memcached servers are.)
>
> -Steve
>
>
> On 5/3/07 2:16 PM, "Cal Heldenbrand" <cal at fbsdata.com> wrote:
>
>     Steve,
>
>     Just curious what are the OS load averages on your database
>     servers? Have you expanded facebook to the point where losing most
>     of the memcache servers would cause your entire application to
>     grind to a halt?
>
>     During my initial thoughts on integrating memcache into our
>     product, I could see it eventually becoming a crutch and we
>     wouldn't have enough database hardware to support the application
>     anymore. I wonder if that's a good thing or a bad thing?
>
>     Thanks!
>
>     --Cal
>
>     On 5/3/07, *Steve Grimm* <sgrimm at facebook.com> wrote:
>
>         We rebuild from the database. We have enough memcached servers
>         that losing one has a relatively small effect on our cache hit
>         rate. Not to say there's no effect -- our DB load spikes up
>         for a little while when we lose a memcached server -- but we
>         build out our infrastructure such that even at peak load,
>         repopulating an empty memcached instance or two doesn't slow
>         things down noticeably for the users.
>
>         -Steve
>
>
>
>         On 5/3/07 12:23 PM, "Murty Chittivenkata" <murty at aol.net> wrote:
>
>             Steve,
>
>             are you replicating the hash data to hotspares or
>             rebuilding in the event of failure from backend database?
>
>
>             Thanks
>             Murty
>
>
>
>
>                     We have a home-built management and monitoring
>                     system that keeps track of all our servers, both
>                     memcached and other custom backend stuff. Some of
>                     our other backend services are written
>                     memcached-style with fully interchangeable
>                     instances; for such services, the monitoring
>                     system knows how to take a hot spare and swap it
>                     into place when a live server has a failure. When
>                     one of our memcached servers dies, a replacement
>                     is always up and running in under a minute.
>
>
>
>
>
>
>