Few queries on atomicity of requests

Sat Jun 21 19:47:01 UTC 2008

On Jun 21, 2008, at 11:35, Daniel wrote:

>> - Overcomplicated. See dustin's rails examples for easy  
>> abstractions on
>> getting to the 90% mark.
>
> You are absolutely correct...  But wouldn't it be worth making a
> memcached/database combo that's overcomplicated if we could get a
> performance boost from every app that uses just like they use a
> database?

	It just feels like the wrong level.  If you're just caching database  
results, you're not going to be getting the best usage out of your  
app.  I mean, you're not likely to be doing all that much better than  
what any other query cache does today.

	If you're caching the objects you build from the results of a query,  
you can do a *lot* better.  You can start doing things like simply not  
joining tables, not doing N+1 queries, etc...

	I had an app that, under normal circumstances, could turn five or so  
DB queries (optimal) to get all of the information required to build  
the objects necessary to process a request instead use exactly one  
cache look up to do the same.  It would complete a transaction with  
exactly one cache write and one DB transaction over some large number  
of queries (which were all performed asynchronously to the client  
request -- in this particular app, there was no value in making the  
client wait for positive confirmation on DB writes).

	No matter how big or optimal your DB is, a single round trip for a  
graph of data from one of many memcached instances will be faster than  
multiple queries on a more centralized ACID database.

> Even with a full fledged database cluster doing the work, the cluster
> ends up running more slowly because it has to handle all of the extra
> database needs, replication, journaling, failover, version control,
> blocking, etc etc.

	I thought you were advocating MVCC in memcached earlier?  I don't  
think you can reach the levels of guarantee you're asking for over DB  
transactions without something like MVCC and 2pc.  If you start adding  
these types of things in, you're just making another database, and  
it'll be as slow as any of them.

> FYI, Oracle's product, TimesTen seems to have some performance metrics
> that I believe apply...  They reported on page 44 of one financial
> trading system having an order of magnitude performance increase:

	Well, yes, databases that don't have to write to filesystems are  
faster.  Caches that don't try to be databases are faster than those.

	There are plenty of open source in-memory databases available today  
(e.g. sqlite or mysql).

>
> http://www.oracle.com/technology/products/timesten/pdf/oow2007/oow07_s291347_timesten_caching_use_cases.pdf
>
> I just thought of another way of describing it. The CDD's are like
> Reader Databases, while the core database handles all writing.

	I think many of us build our applications this way, but  
deliberately.  If it were to happen automatically, it'd be at a cost  
of performance.

> Yes, that is a better memcached specific application design. What I'm
> suggesting will be significantly slower than an integrated designed- 
> for
> memcached app. That's way I believe it seems worthwhile to include the
> current memcached functionality in the CDD.

	I think the best way to approach this is by building an architecture  
that fits it well.  Turns out, that's pretty hard, and then nobody  
wants to use it.

	I get what you're saying.  You want to make something everyone can  
use, but doesn't do all that much.  It kind of sounds like mysql's  
query cache.  I don't know how it'd be better than that.  I've not  
heard anything particularly wonderful about it.

	I've written some activerecord extensions that can do automatic  
caching and invalidation of objects by relationship.  This lives  
within the ORM because it can act on real live objects and has a deep  
understanding of when things change and knows what to do about it.   
Although it's in its infancy stage, it basically works.  However,  
getting it to do all of the stuff it should/could would require a  
*lot* of work and could very well make things slower.  Sometimes it's  
just better to say what you mean.

> I don't get it...  Here in "memcached land" we're dealing with
> situations where if we DON'T warm up the cache before going live can
> make sites blow up, meanwhile people are saying/thinking that a  
> generic
> memcached/database combination isn't worth the trouble.

	The way I've taken care of this kind of situation in the past is to  
rate control going live.  e.g. if I'm down, that's generally bad.  If  
I'm up, that's great, but I get a huge traffic spike at the worst  
time.  So I just have a probabilistic request acceptance filter that  
goes from accepting 0% of requests to 100% of requests over n  
minutes.  Adding ceilings and different slopes on various request  
types can help, too.

	So you're not down, but you're also not completely up for a period of  
time.  Doing this makes the warming pretty much take care of itself.

> Memcached is great as it is, but it sometimes returns old data.

	``There are only two hard things in Computer Science: cache  
invalidation and naming things.'' -- Phil Karlton

> The
> database knows when this data changes. Why can't we develop a system
> that will make it so the database alerts/updates the cache when data  
> is
> changed.

	My ORM knows when data changes, and already has a *really* easy way  
to perform cache invalidations.  However, I still can't always get  
everything.  There are lots of things that are cached as derived  
values -- pages, parts of pages, objects that contain other objects,  
etc...  The application has far, far better insight into all of the  
places a given piece of information is used, but it'll be along time  
before it can automatically track all of them.

> We're all familiar with those knarly queries that need to be run
> repeatedly. I believe it's possible to create a system where the query
> is kept up to date by the Caching Database Daemon's (CDD's) adding a
> very small overhead to the core database if there are enough CDD's to
> keep the underlying data in cache memory.

	We do that in our apps pretty easily today, except it scales  
horizontally by not requiring some centralized service to do it.  One  
of the front-ends changes something, and updates one of the memcached  
servers.

> When the application wants the results, it's CDD simply sends a  
> request
> to all of the other CDD's and sorts/limits/distinct the results.   
> What a
> cool performance boost!


	That assumes it can read the data I've cached (or, in this case, it's  
caching for its own benefit).  That application seems really  
specialized.  With as little code as I generally write to get caching  
working in apps, I don't see how something like that would be less work.

-- 
Dustin Sallings