Hackathon / Multidimensional keys / Wildcard deletes

Sun Jul 8 04:42:35 UTC 2007

There's a tension here: if you look at profiling data, command parsing
actually turns out to be the single most expensive part of the code. That is
still true after several rounds of optimization of the parser. So we
certainly want a nice verbose extensible human-readable protocol, but we
also need to move in the opposite direction: a protocol that can be parsed
just about for free. That is almost certainly going to be a binary protocol
(or at the very least a less freeform text protocol that what we have now.)

Obviously both can be supported. But it's something to be aware of: sending
the server a bunch of commands that take even more time than the current
protocol's to parse will probably have a much bigger than expected negative
impact on memcached's CPU efficiency.

That said, this is probably only an actual issue for a fraction of a percent
of the sites that use memcached. On most small-scale sites with a moderate
amount of traffic, it's rare to see memcached even show up in "top", and if
you quadrupled the cost of command parsing, that would still likely be the
case. So as long as there is a high-performance protocol, and there are
clients available that speak it, I suspect none of us who run high-volume
sites would object to there also being a more verbose human-readable one.

As an aside, if you have a robust HTTP server running inside memcached, the
temptation will be to use it for benchmarking, e.g., by running "ab" against
a memcached instance. Of course memcached isn't going to be *slow* in such a
benchmark, but you will be tying memcached's hands behind its back in some
sense: you will be using the lowest performance interface available (and the
interface is the single most expensive part of the code, so that's
significant) and you will, most likely, be giving up stuff like multi-key
"get" requests, which actually turn out to be a huge efficiency win. I bet
it won't take long for the first "I thought memcached was supposed to be
fast, but look at these mediocre numbers from ab!" blog post.

One other thought: since, if you buy what I said above, HTTP is not going to
be the choice of the performance-sensitive, maybe it makes sense to consider
implementing the HTTP support as a completely separate frontend that acts as
a proxy to a memcached instance that speaks the high-speed protocol. I am
not sure that's actually a good idea but I thought I'd toss it out there.

-Steve

On 7/7/07 9:11 PM, "Paul Querna" <chip at corelands.com> wrote:

> Paul Querna wrote:
>> Dustin Sallings wrote:
>>>     There'd be indexing overhead, but you could have an O(1)
>>> invalidation if the tags themselves were versioned.
>>> 
>>>     Assuming the cache time is short or you're accessing these records,
>>> cleanup should pretty much take care of itself.
>>> 
>>>     Protocol-wise, would it make sense to have the tags be additional
>>> tokens on the mutation line?  i.e.:
>>> 
>>>     <command name> <key> <flags> <exptime> <bytes> [<tag> [...]]\r\n
>> 
>> 
>> Well, it does bring up a wider issue of protocol versioning....
>> 
>> I was thinking about a more generic structure, if we ever did a protocol
>> revamp, something like:
>> 
>> <command>\n
>> <meta>=<string||int>\n
>> data=<data>\m
>> END
>> 
>> So, for example a SET today would be like:
>> SET
>> key=foobar
>> flags=400
>> bytes=1000
>> data=...data...
> 
> And, thinking about that syntax a little more, we are just reinventing
> HTTP, so, why not make memcached's protocol v2 just be HTTP :-) ?
> 
> -Paul