memcached protocol (questions)

Brad Fitzpatrick brad@danga.com
Sat, 26 Jul 2003 21:59:25 -0700 (PDT)


On Sat, 27 Jul 2003, JM Ibanez wrote:

> 1) Are ints stored directly?

The Perl and PHP clients store them as decimal strings, but there's no
reason you couldn't store them natively as 4 bytes of binary data if you
wanted.  The advantage of storing them as text is that the incr/decr
commands will only work on them if they're text.

> What exactly are incr and decr for?

LiveJournal uses them (well, not yet, soon) for reply counts to posts.
When a new comment comes in, we update the database summary table (UPDATE
foo SET replycount=replycount+1 WHERE ...) but we also update memcache.
Each post has a replycount object in memcache.

> From how I understand it, incr/decr increments/decrements a value stored
> in the server in-place; the value is assumed to be a 32-bit unsigned
> int.

Almost.  The value is assumed to be a string containing the decimal
representation of an integer in the range allowed by a 32-bit uint.

> So, that means that the server can manipulate stored values. I thought
> the values would be opaque to the server? Someone please clarify me on
> this.

incr/decr are the only commands that manipulate the value.  I thought
they'd be less frequently used, but both Slashdot and Dormando (who posted
previously) are interested in using them.

I need to include Perl support for them.  Jamie sent a patch, but his
patch doesn't make incr/decr return the new value, which would be damn
useful for implementing transient queues in memcache.

LiveJournal has an async job system (so web requests can request a job be
done later, without blocking the client), but the method of filing those
is into a db table, including arguments.  DB writes suck.  For all our
unimportant jobs (updating the in-memory last posts, or posting to
weblogs.com), we're thinking about putting the async queue for those job
types in memcache by having objects:

    jobq_<job>_head   = 2  (example)
    jobq_<job>_item_2 = <binary encoding of job>
    jobq_<job>_item_3 = <binary encoding of job>
    jobq_<job>_item_4 = <binary encoding of job>
    jobq_<job>_tail = 4

Now, if I want to append a job, I do an incr on jobq_<job>_tail, get the
return value, make a new _item_ key.

If that memcache server were to die... no big loss.  Any important async
job we'd back with the database queue.

> 2) Is data replicated across servers, or is a particular key unique to a
> particular server? I've been looking through both the Perl and PHP
> implementations, and I see that the server is selected via a hash value
> created from the user key. Does this mean the client is responsible for
> selecting which server it stores data in? Or I'm completely misreading
> the implementations? (NOTE: I know enough of Perl and PHP to read code,
> but not write code)

Replication would kill performance.  Each server is totally independent of
each other.  It's the client's job to hash the request onto the right
server.

The Perl and PHP clients both let the key also contain the explicit
hashing value.  LiveJournal uses this to keep all of a user's data on the
same actual memcached process.  All our keys of of form:

   [$userid, "uprop:$userid:foo"]

So instead of the Perl library hashing "uprop:$userid:foo" and finding a
number, then taking that large number modulous number of memcache
"buckets", it just uses $userid % $num_buckets.

The Perl/PHP APIs default to each ip/port pair being a bucket, but if some
machines are running with larger max memory, you define in your client
config for that machine to have a higher weight (= bucket count) instead
of 1.

I imagine a client could be smart and query each server for its max
memory, but I don't like the idea of the library on startup having to do
an O(n) connect scan.  Doesn't scale, and there's that extra start-up
latency.  Perhaps we could make servers periodically send UDP broadcasts
of their IP/port/mem_size, and each server keeps track of the rest of
them, so then a smart client would only have to connect to one to get the
entire list, but I don't like that much either.

Anyway, currently there are no known scaling issues.

Out of memory?  Add more processes/servers. Out of CPU?  Add more servers.
(hah.. unlikely.  it takes hardly any CPU.  we ran it in production on a
533 Mhz fanless Mini-ITX.)


> Please enlighten me :)

Let me know if you have more questions.

- Brad

>
> --
> Jan Michael Ibanez
> Student, University of Asia & the Pacific
>
> CELL   +63919 422 1141
> WEB    one Generic lizard Geek's LiveJournal
>        http://www.livejournal.com/~cyberlizard
> WEB    CyberLizard productions (coming soon)
>        http://www.mycgiserver.com/~butiki
>
>
>