Few queries on atomicity of requests
Dustin Sallings
dustin at spy.net
Sat Jun 21 19:47:01 UTC 2008
On Jun 21, 2008, at 11:35, Daniel wrote:
>> - Overcomplicated. See dustin's rails examples for easy
>> abstractions on
>> getting to the 90% mark.
>
> You are absolutely correct... But wouldn't it be worth making a
> memcached/database combo that's overcomplicated if we could get a
> performance boost from every app that uses just like they use a
> database?
It just feels like the wrong level. If you're just caching database
results, you're not going to be getting the best usage out of your
app. I mean, you're not likely to be doing all that much better than
what any other query cache does today.
If you're caching the objects you build from the results of a query,
you can do a *lot* better. You can start doing things like simply not
joining tables, not doing N+1 queries, etc...
I had an app that, under normal circumstances, could turn five or so
DB queries (optimal) to get all of the information required to build
the objects necessary to process a request instead use exactly one
cache look up to do the same. It would complete a transaction with
exactly one cache write and one DB transaction over some large number
of queries (which were all performed asynchronously to the client
request -- in this particular app, there was no value in making the
client wait for positive confirmation on DB writes).
No matter how big or optimal your DB is, a single round trip for a
graph of data from one of many memcached instances will be faster than
multiple queries on a more centralized ACID database.
> Even with a full fledged database cluster doing the work, the cluster
> ends up running more slowly because it has to handle all of the extra
> database needs, replication, journaling, failover, version control,
> blocking, etc etc.
I thought you were advocating MVCC in memcached earlier? I don't
think you can reach the levels of guarantee you're asking for over DB
transactions without something like MVCC and 2pc. If you start adding
these types of things in, you're just making another database, and
it'll be as slow as any of them.
> FYI, Oracle's product, TimesTen seems to have some performance metrics
> that I believe apply... They reported on page 44 of one financial
> trading system having an order of magnitude performance increase:
Well, yes, databases that don't have to write to filesystems are
faster. Caches that don't try to be databases are faster than those.
There are plenty of open source in-memory databases available today
(e.g. sqlite or mysql).
>
> http://www.oracle.com/technology/products/timesten/pdf/oow2007/oow07_s291347_timesten_caching_use_cases.pdf
>
> I just thought of another way of describing it. The CDD's are like
> Reader Databases, while the core database handles all writing.
I think many of us build our applications this way, but
deliberately. If it were to happen automatically, it'd be at a cost
of performance.
> Yes, that is a better memcached specific application design. What I'm
> suggesting will be significantly slower than an integrated designed-
> for
> memcached app. That's way I believe it seems worthwhile to include the
> current memcached functionality in the CDD.
I think the best way to approach this is by building an architecture
that fits it well. Turns out, that's pretty hard, and then nobody
wants to use it.
I get what you're saying. You want to make something everyone can
use, but doesn't do all that much. It kind of sounds like mysql's
query cache. I don't know how it'd be better than that. I've not
heard anything particularly wonderful about it.
I've written some activerecord extensions that can do automatic
caching and invalidation of objects by relationship. This lives
within the ORM because it can act on real live objects and has a deep
understanding of when things change and knows what to do about it.
Although it's in its infancy stage, it basically works. However,
getting it to do all of the stuff it should/could would require a
*lot* of work and could very well make things slower. Sometimes it's
just better to say what you mean.
> I don't get it... Here in "memcached land" we're dealing with
> situations where if we DON'T warm up the cache before going live can
> make sites blow up, meanwhile people are saying/thinking that a
> generic
> memcached/database combination isn't worth the trouble.
The way I've taken care of this kind of situation in the past is to
rate control going live. e.g. if I'm down, that's generally bad. If
I'm up, that's great, but I get a huge traffic spike at the worst
time. So I just have a probabilistic request acceptance filter that
goes from accepting 0% of requests to 100% of requests over n
minutes. Adding ceilings and different slopes on various request
types can help, too.
So you're not down, but you're also not completely up for a period of
time. Doing this makes the warming pretty much take care of itself.
> Memcached is great as it is, but it sometimes returns old data.
``There are only two hard things in Computer Science: cache
invalidation and naming things.'' -- Phil Karlton
> The
> database knows when this data changes. Why can't we develop a system
> that will make it so the database alerts/updates the cache when data
> is
> changed.
My ORM knows when data changes, and already has a *really* easy way
to perform cache invalidations. However, I still can't always get
everything. There are lots of things that are cached as derived
values -- pages, parts of pages, objects that contain other objects,
etc... The application has far, far better insight into all of the
places a given piece of information is used, but it'll be along time
before it can automatically track all of them.
> We're all familiar with those knarly queries that need to be run
> repeatedly. I believe it's possible to create a system where the query
> is kept up to date by the Caching Database Daemon's (CDD's) adding a
> very small overhead to the core database if there are enough CDD's to
> keep the underlying data in cache memory.
We do that in our apps pretty easily today, except it scales
horizontally by not requiring some centralized service to do it. One
of the front-ends changes something, and updates one of the memcached
servers.
> When the application wants the results, it's CDD simply sends a
> request
> to all of the other CDD's and sorts/limits/distinct the results.
> What a
> cool performance boost!
That assumes it can read the data I've cached (or, in this case, it's
caching for its own benefit). That application seems really
specialized. With as little code as I generally write to get caching
working in apps, I don't see how something like that would be less work.
--
Dustin Sallings
More information about the memcached
mailing list