Hackathon / Multidimensional keys / Wildcard deletes

Paul Querna chip at corelands.com
Sun Jul 8 15:54:23 UTC 2007


On 7/7/07, Steve Grimm <sgrimm at facebook.com> wrote:
>
> There's a tension here: if you look at profiling data, command parsing
> actually turns out to be the single most expensive part of the code. That
> is
> still true after several rounds of optimization of the parser. So we
> certainly want a nice verbose extensible human-readable protocol, but we
> also need to move in the opposite direction: a protocol that can be parsed
> just about for free. That is almost certainly going to be a binary
> protocol
> (or at the very least a less freeform text protocol that what we have
> now.)


Right, I haven't honestly done any significant profiling -- and if command
parsing is the slowest part.. well.. Ugh.

Obviously both can be supported. But it's something to be aware of: sending
> the server a bunch of commands that take even more time than the current
> protocol's to parse will probably have a much bigger than expected
> negative
> impact on memcached's CPU efficiency.


In general, I disagree -- I don't like having multiple protocol interfaces
to one thing like memcached -- all it means is various clients will only
implement one of X possible protocol formats, all with different feature
sets, creating a massive mess on the client side of memcached, something
that isn't often discussed here.


> As an aside, if you have a robust HTTP server running inside memcached,
> the
> temptation will be to use it for benchmarking, e.g., by running "ab"
> against
> a memcached instance. Of course memcached isn't going to be *slow* in such
> a
> benchmark, but you will be tying memcached's hands behind its back in some


Bleh, Idiots will continue to do stupid blog posts with ab over local host
gave them an amazing 10,000 requests per second. Just try to ignore them and
don't let them drive the technical decisions on a project.

sense: you will be using the lowest performance interface available (and the
> interface is the single most expensive part of the code, so that's
> significant) and you will, most likely, be giving up stuff like multi-key
> "get" requests, which actually turn out to be a huge efficiency win. I bet
> it won't take long for the first "I thought memcached was supposed to be
> fast, but look at these mediocre numbers from ab!" blog post.


There is no reason we would have to give up multi-gets.  They could be
implemened as a new/custom HTTP Method, there is nothing forbidding that.

I wouldn't be so quick to dismiss HTTP.. There are some interesting ideas on
how to do async parsing of the protocol in projects like serf:
http://code.google.com/p/serf/

One other thought: since, if you buy what I said above, HTTP is not going to
> be the choice of the performance-sensitive, maybe it makes sense to
> consider
> implementing the HTTP support as a completely separate frontend that acts
> as
> a proxy to a memcached instance that speaks the high-speed protocol. I am
> not sure that's actually a good idea but I thought I'd toss it out there.


I think there is a more interesting thought that this is leading to -- that
memcached should be more modular.  In the last couple weeks people have
brought up having multiple backend storage methods.  Perhaps we should look
at making a *compile* time selection system for both a frontend command
parser and a backend storage system?

-Paul



-Steve
>
>
> On 7/7/07 9:11 PM, "Paul Querna" <chip at corelands.com> wrote:
>
> > Paul Querna wrote:
> >> Dustin Sallings wrote:
> >>>     There'd be indexing overhead, but you could have an O(1)
> >>> invalidation if the tags themselves were versioned.
> >>>
> >>>     Assuming the cache time is short or you're accessing these
> records,
> >>> cleanup should pretty much take care of itself.
> >>>
> >>>     Protocol-wise, would it make sense to have the tags be additional
> >>> tokens on the mutation line?  i.e.:
> >>>
> >>>     <command name> <key> <flags> <exptime> <bytes> [<tag> [...]]\r\n
> >>
> >>
> >> Well, it does bring up a wider issue of protocol versioning....
> >>
> >> I was thinking about a more generic structure, if we ever did a
> protocol
> >> revamp, something like:
> >>
> >> <command>\n
> >> <meta>=<string||int>\n
> >> data=<data>\m
> >> END
> >>
> >> So, for example a SET today would be like:
> >> SET
> >> key=foobar
> >> flags=400
> >> bytes=1000
> >> data=...data...
> >
> > And, thinking about that syntax a little more, we are just reinventing
> > HTTP, so, why not make memcached's protocol v2 just be HTTP :-) ?
> >
> > -Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070708/c876a9df/attachment.html


More information about the memcached mailing list