Another Max Operations Measurement

Wed Jul 26 18:31:00 UTC 2006

Quick background: we released some new visualisations for Digg yesterday 
which dramatically increased traffic to our site. The initial 
implementation of our visualisations basically (not exactly, but a 
useful generalisation) used memcached as a memory store through which it 
would do linear scans looking for bits of data. As you can imagine, this 
taxed our memcached servers.

With plain vanilla memcached 1.1.12 on a quad Opteron running Debian 
Linux, our maximum gets per second for a given memcached server hovered 
around 15k-16k. The reported CPU usage by memcached itself is linear to 
the number of gets.

One of our memcached servers stopped responding entirely. The daemon was 
still running, but as soon as I connected to it ("telnet <host> 11211") 
it would close the connection again. The rest seemed to keep running 
fine, albeit at maximum utilisation.

It appears that memcached maxed out the (single) CPU it was given, and 
this is what caused the limited number of gets (CPUs were around 
110-120% of 400%). I am unable to correlate exactly the CPU usage as 
reported by the OS with gets because unfortunately our memcached servers 
live on machines that do other semi-CPU-intensive tasks. I must trust 
the memcached-reported statistic (I haven't found a reason not to, just 
trying to be thorough).

Just thought some of you might be interested.

(Afterward: We changed our code to stop doing the "linear scan" method 
of accesses sometime last night, and our memcached usage has dropped 
back to a reasonably comfortable level)

--
timeless