Few queries on atomicity of requests

Mon Jun 23 06:55:29 UTC 2008

Hi Dormando

Thanks.

I'm going to try to keep this e-mail short for a change...

> I don't get it...  Here in "memcached land" we're dealing with
> > situations where if we DON'T warm up the cache before going live can
> > make sites blow up, meanwhile people are saying/thinking that a generic
> > memcached/database combination isn't worth the trouble.

I don't think I made my point here very well...  Yes, clustered
databases are fast, but it seems memcached allows a system to grab data
much much faster, with the realization that it may be stale.

Why isn't there much interest in looking to see how these two systems
could be combined? The database programmers I've spoken to know how much
work and effort has gone into optimizing the database, and don't take
lightly any suggestion that performance can easily be improved.

Many people here realize that a memcached/database solution would be
slower than app specific memcached calls. Alternately anything beyond an
"app wrapper" toolkit like django which handles database caching is good
enough.

I believe that databases could be improved by integrating the benefits
of memcached caching into CDD's, and a modified core database to
communicate the the CDD's.

The beauty and the trouble of this idea of making a cache database
daemon (CDD) that is able to provide guaranteed accurate data is that
the core database would have to report to the CDD when data in or out of
a transaction is updated or deleted.

One other benefit I didn't really think to mention until now, is that
when the CDD can provide guaranteed accurate data, the core database
will be able to request data from the CDD rather than having to go get
it from a disk.

Thanks

Daniel

On Sun, 2008-06-22 at 20:37 -0700, dormando wrote:
> I think this is a bit off topic now... There're two separate threads
> going on. The first being people giving many examples on how using
> memcached is actually pretty easy, and another about an in memory
> database with loose specs... I'd love to have brainstorming sessions on
> memcached design patterns, but I'm really not seeing how this database
> could be performant at all.
> 
> > You are absolutely correct...  But wouldn't it be worth making a
> > memcached/database combo that's overcomplicated if we could get a
> > performance boost from every app that uses just like they use a
> > database? 
> 
> If that's easy, go ahead and patch mysql to do it? I think we can go
> back and forth on poo-poo'ing ideas, but if you can prove it wrong then
> that's fine. That needs specs, proof of concepts, algorithmic tests, etc.
> 
> > First, In memory databases are, or at least can be far faster than
> > databases requiring a spinning disk access. Flash disk accesses and
> > remote in-memory accesses are somewhere in between.
> 
> Eh, I'll expand a little:
> 
> Take a machine with 32G of ram, and your dataset is 20G (data + indexes
> + overhead in innodb). Your innodb logfiles are 256 megs a piece, and
> you have 512M of battery backed write cache over an 8 disk RAID10 or
> whatever.
> 
> Fire up your database, select * from all tables with an impossible
> constraint on each index (ghetto trick for preloading all indexes + data
> in innodb; see mysqlperformanceblog). As you do further reads innodb
> converts what it can into a hash table, and writes you do commit into
> the writeback memory. So you're not blocked on disk unless you start
> writing very heavily, or your dataset is bigger than memory. Most of us
> use these BBU's now... But it's irrelevant. We statically cache the
> results of the database queries to avoid the overhead of the database
> parser/optimizer/query execution and storage engine data
> fetch/conversion. We do it because memcached itself is distributed, and
> adding more helps.
> 
> > Even with a full fledged database cluster doing the work, the cluster
> > ends up running more slowly because it has to handle all of the extra
> > database needs, replication, journaling, failover, version control,
> > blocking, etc etc.
> 
> Sure? I guess? Replication's not much overhead...
> 
> > In this system, I'm describing a system where the app, and the CDD are
> > on the same system to avoid extra remote accesses, since the CDD is
> > running on the app machine. More important, however...
> 
> That doesn't make a huge difference... Everyone thought that it would
> make a huge difference so no one had really popularized memcached until
> brad wrote it and started proving that it wasn't such a huge deal. Even
> then, communicating with a process on the same machine isn't instantly
> fast, as you have scheduler overhead and system CPU blown on the pipe.
> 
> Sure, network roundtrips suck. Yawn. Batch what you can into the same
> network roundtrip (multiget), and for the rest of the time your apps are
> idle on network instead of CPU is the amount of extra parallel processes
> you can fire up on your app servers. If my individual processes are
> spending 50% of their time in CPU land, and 50% of their time in wait, I
> can run 2x per core on the box. The only real limiting factor it gives
> you is rendering time, and if you're not querying a system that's as
> *dead simple* as memcached you're going to lose that overhead regardless.
> 
> (I'm skipping the oracle bit; too much bias)
> 
> > I just thought of another way of describing it. The CDD's are like
> > Reader Databases, while the core database handles all writing. 
> 
> I don't really understand why this is better... you're caching more
> redundant data everywhere. Memcached gives you higher cache efficiency
> by allowing you to use every last drop of storage you throw at it. For
> any amount of redundancy you lose cache size, and cache misses are
> almost guaranteed to suck more than the network roundtrip to memcached.
> 
> > Yes, that is a better memcached specific application design. What I'm
> > suggesting will be significantly slower than an integrated designed-for
> > memcached app. That's way I believe it seems worthwhile to include the
> > current memcached functionality in the CDD.
> 
> That's fine. I'd rather my whole app be fast and my developers
> understand what's going on... Simple abstractions and design patterns
> from the application can still give you most of that benefit, as Dustin
> continues to show many examples of. What's better is that it'll give you
> *more* than just that "my app thinks it's a database but it's a *little*
> faster!" benefit since you would have already built memcached into your
> app, so it'll be simpler to plug it in anywhere else. It reads like you
> want to try really hard to get a 50% speedup, instead of try a lot less
> to get a 90% speedup everywhere.
> 
> > I don't get it...  Here in "memcached land" we're dealing with
> > situations where if we DON'T warm up the cache before going live can
> > make sites blow up, meanwhile people are saying/thinking that a generic
> > memcached/database combination isn't worth the trouble.
> 
> Dustin touched on this too; If it hurts when you do that, then don't do
> it. Regardless of how your system is designed, if you turn it all on at
> the same time it won't work immediately. You have to preload data
> somewhere, or do the simple thing and let traffic in slowly. I yakked
> about this same problem at SXSW by example of gaiaonline's many troubled
> feature launches. The solution was trivial, it was a marketing problem.
> 
> > As another way of thinking about it, first implement the things that
> > memcached can do easily, and let the more complex tasks fall through to
> > the database. Surely that can be done without slowing the database down.
> 
> If it's easy to add those features, add it to memcached's base and
> submit patches, please! :) The server as-is already has that
> functionality, so surely you can build on that :)
> 
> > For the fun of it, let me revisit one of the more complex tasks, and see
> > if you can't see how this could result in an incredible performance
> > boost.
> 
> Map-reduce with cache? Something that has to touch every part of a
> system isn't fundamentally scalable, that's usually a design tradeoff to
> make that query fast.
> 
> -Dormando