Caching collections of objects

Brian Moon brianm at dealnews.com
Sun May 27 14:54:45 UTC 2007


We have been caching at dealnews in some form since the company began 10 
years ago.  We have been using memcached for over a year.

> 1. Caching of collections (eg: "give me all the user comments related to 
> foo")
> 2. Caching entire page outputs based upon the unique url (eg: "give me 
> the xhtml output for foo/bar?baz=1")
> 3. A combination of #1 & #2

We do both at dealnews.

> Getting the data into the cache and retrieving it when appropriate is a 
> simple matter. It is when the data in the cache has become stale and I 
> need to flush it from the cache that I become stuck as to how best to 
> solve the problem.

I have seen this worry a lot on this list.  For us it is all about the 
ttl.  We decided on a ttl we could live with for objects.  Its just 2 
minutes for our front page.  But, with a 2 minute ttl we get a 85% cache 
hit rate.  Well worth it.  For other pages its 15 minutes.  Some its an 
hour and really old content is cached for a day.  When you start getting 
into serious traffic, you have to let go of the obsession that the 
content all gets updated at the exact same time on every page 
everywhere.  Its just not realistic anymore.

For object level stuff, we do some updating.  We have processes that 
regenerate content and it freshens that memcache data when needed.  But, 
those are very few objects that are hooked into our existing publishing 
system.  We don't looking for every place that object X may be on a page 
and remove it.  We let the ttl take care of that.

The important part of this method however is that you must be able to 
deal with having your cache expire at some point gracefully.  If your 
site can't deal with having a couple of pieces of expired cache on the 
page, then you will be in trouble.

> One solution for approach #2 above is to simply flush all cached page 
> data whenever there are writes to the database. Though this is 
> sub-optimal and would result in low cache hit rates I'm assuming.

I don't know how often you write to your database.  But, yeah, that 
would be quite useless for us.  If you are using mysql, you can just use 
the mysql query cache for that effect.

-- 

Brian Moon
Senior Developer
------------------------------
http://dealnews.com/
It's good to be cheap =)


More information about the memcached mailing list