Another Max Operations Measurement
time at digg.com
Wed Jul 26 18:31:00 UTC 2006
Quick background: we released some new visualisations for Digg yesterday
which dramatically increased traffic to our site. The initial
implementation of our visualisations basically (not exactly, but a
useful generalisation) used memcached as a memory store through which it
would do linear scans looking for bits of data. As you can imagine, this
taxed our memcached servers.
With plain vanilla memcached 1.1.12 on a quad Opteron running Debian
Linux, our maximum gets per second for a given memcached server hovered
around 15k-16k. The reported CPU usage by memcached itself is linear to
the number of gets.
One of our memcached servers stopped responding entirely. The daemon was
still running, but as soon as I connected to it ("telnet <host> 11211")
it would close the connection again. The rest seemed to keep running
fine, albeit at maximum utilisation.
It appears that memcached maxed out the (single) CPU it was given, and
this is what caused the limited number of gets (CPUs were around
110-120% of 400%). I am unable to correlate exactly the CPU usage as
reported by the OS with gets because unfortunately our memcached servers
live on machines that do other semi-CPU-intensive tasks. I must trust
the memcached-reported statistic (I haven't found a reason not to, just
trying to be thorough).
Just thought some of you might be interested.
(Afterward: We changed our code to stop doing the "linear scan" method
of accesses sometime last night, and our memcached usage has dropped
back to a reasonably comfortable level)
More information about the memcached