MOM: Memcached Operations Monitoring
krw at nobugz.com
Thu Oct 11 21:16:01 UTC 2007
My company does something unusual with memcached that is extremely
valuable to us, and I'm wondering if others would find the code
useful. I apologize if this email is long, but what I'm proposing
requires some background explanation, as it is radically different
from the typical use of memcached. The ultimate question is whether
it is worthwhile to make the code available for others to use.
One problem with running a large site (hundreds of millions of hits
per hour) is keeping track of what is going on. At one point I
worked on the Ebay swat team, and I would get calls at 2 in the
morning from the operations center, wondering why a set of machines
was acting up. Without instrumentation, it is nearly impossible to
figure out. With the proper instrumentation, it is child's play.
Ebay has a VERY large system to track all activity on the site;
however, that system is VERY large and VERY expensive. Using
memcached, I've developed something that gives you 90% of the value
of Ebay's system for perhaps 1% of the cost.
I have modified memcached as well as the java client library; with
these modifications, and very few lines of code in the application, I
can tell precisely how many URLs and SQLs are executing on a
particular machine in any given minute or in any given hour. I can
tell you the average execution time, the maximum execution time, as
well as the number of failures. I can tell you which URLs were
expensive, which URLs invoked SQL statements, which urls failed most
often. Coupled with a small mysql database, I can give you more
operational statistics on our site than many larger sites have
Just to reassure those who are assuming this must be a very expensive
use of memcached, I can say from experience that with a single
instance running on a linux box with a mere 10M of memory assigned,
we aggregate information on about 20 pools and several hundred
machines at a rate of 10-15K operations per second, and have never
gotten close to capacity.
Would anyone else be interested in this? Or is this too far off the
beaten path? It is mostly helpful for very busy sites. Thanks.
More information about the memcached