Memcached Database Use

Fri Jun 22 22:51:51 UTC 2007

On Jun 22, 2007, at 13:56 , Chris Miller wrote:

> I see how that by storing database results in memcached would be  
> very helpful, but how does memcached know when the result set in  
> cache has changed?

	Long answer:

	I wrote an app called diggwatch[0] that uses the digg API as my  
primary data store, and stores all the useful information in  
memcached locally.  Cache misses for me are really expensive, and the  
digg API makes certain operations I want to perform somewhat difficult.

	For example, the primary thing I wanted this app to do for me is  
tell me when anyone responds to any comment I make on a digg  
article.  Basically, that looks like this:

	1) Ask for any recent comments by username.
	2) Ask for all of the stories to which any of these comments belong  
so I can put useful titles on things.
	3) Ask for any children comments of #1 (or children of the comments'  
parent as defined by the old system).

	As this is primarily used (at least by me) as an RSS provider, that  
request occurs several times throughout the day and I'd like it to be  
cached.  However, I'd *also* like it to be fresh, and I don't get  
notifications from digg.

	I cache the result of #1 for about a minute -- fairly insignificant  
amount of time, but I don't consider that request version expensive.

	I cache the results from #2 for about five minutes.  It's a single  
request for up to something like 100 stories, and I can optimize some  
of it out if I have some of the stories in my cache already.

	#3 is the most expensive query, because I need to run it almost once  
per comment (result of #1).  I cache these for about a day, *but* the  
key includes the number of comments on a given story (which I get in  
the result of #2).  If nobody's commented on a story at all, I can be  
guaranteed that nobody's commented on a thread I'm involved in within  
the story.

	It's not perfect, but it's quite effective and greatly reduces the  
number of trips to digg without having my latency drop below ~5 minutes.


	Short answer:

	Depends on your application, but don't think of it as working with  
result sets as much as objects.  I cache collections of pre-build  
objects, and mash them together in my application code.

	A neat benefit of doing things this way (going back to the long  
answer above), is that understanding my data at this level allows me  
to generate smarter etags such that the typical response sent to an  
RSS reader from my app is 0 bytes (after headers).


[0] http://bleu.west.spy.net/diggwatch/

-- 
Dustin Sallings