Memcached as a Sessions Store (etc)
timeless
time at digg.com
Wed Jun 28 20:30:28 UTC 2006
The background:
For those of you that don't follow Digg, we run a fairly popular news
aggregator site. We run several memcached nodes of 2GB each in size.
Before Monday's rollout of the new version of Digg, we were using at
peak about 500 gets/sec from each of our memcached nodes.
After the rollout (both due to a huge increase in site traffic and a
redesign of the back-end code that utilises memcached more), each node
now sustains at peak about 2k gets/sec. Our get hits are roughly the
same as our gets (ie: >95% get hit ratio). We now commonly surpass the
default 1024 simultaneous connections per memcached server.
We store hundreds of thousands of objects in slab classes from 7 to 19.
Most of our items live in 2^8, and we only have 5 objects living in
2^19. The distribution is weighted toward 2^8 and trails off fairly
significantly in the 2^14-2^19 classes.
We use the PHP memcached client with our own PHP class (tightly-based on
the class in the comments on PHP.net's memcached section) for load
balancing.
What prompted the Memcached as sessions store:
Shortly after the rollout of Digg v3, the non-redundant MySQL session
store hardware crashed. This led to a Digg outage. We had always planned
that in such a case we would just roll a (trivial) change to put
sessions into Memcached rather than MySQL to see how it fared.
We have done so. As you might guess from the volumes I posted above, the
change to store sessions in Memcached nodes was barely noticeable on our
graphs. We store several millions of sessions per day, but each
memcached node sees only a few hundred sessions-related gets/sets per
second.
Thus far (>24 hours of sample period) there have been no problems
related to storing sessions in memcached. In my opinion, at the volumes
we're sustaining, this is a significant enough time period to be
statistically meaningful.
Some things to note:
(1) We don't run the Facebook patches, but since we're using our regular
memcached cluster for our sessions store, essentially we've already
pre-allocated slabs of every class that a session might land in.
(2) Our memcached nodes are 2GB each. The oldest item in each node tends
to be equal to the uptime. It appears we still aren't using Memcached to
a degree that would push things out the back end or lead to memory
allocation errors. If our oldest items were younger than our sessions
age, we would need to allocate more nodes or more RAM per node.
(3) Earlier I reported that doubling Memcached usage didn't seem to
double CPU usage. Now that our memcached usage is heavier, the ratio
between work and CPU usage looks linear.
(4) Eventually, I plan to remove a few memcached nodes from our cluster
to see how the CPU/work ratio scales into the higher ranges. This
probably won't happen for weeks.
(5) In our case reliability is increased with this change as now
sessions are stored across several machines rather than just one. A
total machine failure now affects only a percentage of our userbase
rather than everyone.
--
timeless
More information about the memcached
mailing list