Slashdot's patch

Brad Fitzpatrick brad@danga.com
Sat, 26 Jul 2003 23:11:08 -0700 (PDT)


On Sat, 26 Jul 2003, Jamie McCarthy wrote:
> Hi all,
>
> Slashdot's been using memcached to serve its comment text for a
> couple of weeks now and it's working great.  All part of our push
> to get the number of SQL queries per page down to a bare minimum.

Good to hear!

> Next up is saving user information, but I have issues with
> atomicity there.  I'd love to put our getUser() call into
> memcached, but we store the data for each user in 100 separate
> columns, not as one giant structure, and we call setUser() to
> update individual columns from dozens of places throughout our
> code.  If I put all 100 pieces of data into one giant structure
> that gets stored in memcached, then to update one piece of data
> for a user, I have to get, edit, and set -- and if two processes
> do that at the same time, one of the updates could be lost.

You can mitigate the problem a bit by doing:

    get from memcache (or database)
    edit to database
    get good copy from database
    save good copy to memcache, with expiration time

That's what we do on LiveJournal.  Considering that user edits are so rare
(compared to our hundreds upon hundreds of read hits/second), the db hit
of doing that extra read after the write is minimal, especially because
it's almost certainly in the db cache after you just updated it.

Alternatively, you could store each column as its own memcache object.  We
do that for LJ's user objects as well.  The core object (some 20-ish
columns) we store as a Storable object, and do the hack I listed above.
But userprops (of which there are tons) are stored individually, including
storing 0 byte values in memcache when that prop isn't defined for a user.
(which ends up taking like 30 bytes + key length in memcache, but that's
better than doing a DB seek)

> One of the features of memcached is that it never blocks -- which
> means, I think, that there won't ever be a way to lock data after
> a get, in preparation for a set.  I can think of some tricks with
> incr and decr that I can probably use, but I'm still trying to
> figure out the best way to do that, and how much that will cut
> into the performance boost I hope to get.  We'll see :)

We could add locking to memcache without making it block.

We could have per-connection locks on certain objects.  If the connection
dies, all its locks are released.  While a connection is holding a lock,
nobody else can write to it (the writes just fail), but others could read
it.  Or, we could have two lock types.  With one type, other clients could
read.  With the second, other clients doing a get wouldn't find the key,
as if it didn't exist.

But this just moves the blocking out of the server into the client
library.

Better solution:  the hack above, but with version numbers on items.
Imagine a 'set' or 'replace' command that only replaced the existing item
if the incoming version number is greater than what's there.

Then you either version your rows in the database (whenever you modify any
column, the version is also incremented), or you be lazy and use the row's
timestamp (if you have one).

In any case, memcache isn't a database.  It's a cache to optimize read
performance.  If you think you'll have atomicity issues and you can't
figure out a good way to use memcache to limit the damage a race condition
could cause, use expiration times, and let the problem fix itself in 15/30
minutes, when a get will fail and your app will fall back to the database,
then re-populating the memcache.

Look at LJ's cgi-bin/ljlib.pl:sub update_user ... you'll see not only how
we do the thing above, but how we pack the hashref down (in
cgi-bin/LJ/MemCache.pm) so it takes 4x less memory in memcache.

> Anyway, I figured I'd share with y'all the patch that we've been
> using to install this on Slashdot (and my own sites)... nothing
> fancy just a bit of cleanup IMHO.  And I like having access to the
> stats from the perl side.  This still doesn't install the perl
> client automatically with a 'make install', you need to cd into
> api/perl and 'perl Makefile.PL && make install'.

Thanks!

I've committed most, and added you to CONTRIBUTORS.  I have a few others
to do which conflict, but I'll get them all in, then make a new release
here shortly.


- Brad