How to delete lots of related keys at once

Clint Webb webb.clint at gmail.com
Fri Aug 3 03:39:37 UTC 2007


Yes, of course you are right there.  I was being overly generic.   Or,
rather, I was being specific to the original posters problem, along with the
rash of people asking questions about similar problems.

The reason why the query cache on mysql is dumped whenever anything on the
table is updated, is because it is the safest option.  You can of course,
determine if something is not unsafe yourself, when using memcache in front
of it, but then you have this problem... figuring out which keys are now
invalid because of some new data that has been added.

Something has to be the source of truth.  In most cases it would be the
database.  In a normal real-life situation it is impossible to say that your
memcache and your database will be entirely in sync, and that both can be
the source of truth.  For example, A node in your cache cluster may die, and
you may lose 25% of the keys..  So if a key is missing, build it from the
database.

Therefore, your cache can only be considered "mostly true".  If you've been
very careful, it could be very near to be in sync with teh database... if it
never goes offline, and none of your keys expire unexpectedly.. etc.

My long winded point is.... It might be perfectly alright to cache your
query results.  But it should be an explicit decision to do that, for a
specific reason.  I would hate for people to have picked up memcached,
plopped it on their servers and merely cache all the queries that they do,
for a quick speed boost, because there can be ramifications that are
difficult to overcome.  Especially since you cant do a wildcard delete (yet,
maybe ever).

As far as speed boosts go... make sure you have appropriate indexes on your
tables.  You might be surprised how many programmers out there have built
database apps, and never thought about indexes, because quite frankly, it
works without it.   Until you have a million records and your queries start
taking 20 seconds to process... of course... the programmers easy fix for
that is to add memcache in front of the queries :)

Btw, I agree that spreading the query cache over a cluster is a very good
reason.

On 8/3/07, Don MacAskill <don at smugmug.com> wrote:
>
>
>
> Clint Webb wrote:
> >
> > Why is everyone using memcache for the SQL query as a key?  That is what
> > the query cache on your database is for.  You gain very little doing it
> > this way, except now you have to manage the invalidation yourself, which
> > the query-cache does for you.
> >
>
> While I have no experience with, say, PostgreSQL's or Oracle's query
> caches, MySQL's is terrible for this.
>
> The way it invalides the cache is "if TABLE has changed in any way, get
> rid of any cache entries referencing this TABLE":
>
> INSERT INTO table (id) VALUES (1);
> SELECT * FROM table WHERE id=1;
> // this query is now cached
> INSERT INTO table (id) VALUES (2);
> // the first SELECT is now not cached anymore, despite the 2nd insert
> having nothing to do with it
>
> You can, of course, do much better than this yourself if you goal is to
> cache certain SQL queries.   Which is why some people do this, or
> something like it, using memcached.
>
> There are other reasons, too, like memory size.  What if you want to
> cache, say, 1TB of queries?  Most of us don't have DB boxes with 1TB of
> RAM...  But putting together 32 boxes with 32GB each is easily doable
> (easily being relative to getting a 1TB server).
>
> Don
>



-- 
"Be excellent to each other"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070803/97690238/attachment.html


More information about the memcached mailing list