On 7/7/07, <b class="gmail_sendername">Steve Grimm</b> <<a href="mailto:sgrimm@facebook.com">sgrimm@facebook.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
There's a tension here: if you look at profiling data, command parsing<br>actually turns out to be the single most expensive part of the code. That is<br>still true after several rounds of optimization of the parser. So we
<br>certainly want a nice verbose extensible human-readable protocol, but we<br>also need to move in the opposite direction: a protocol that can be parsed<br>just about for free. That is almost certainly going to be a binary protocol
<br>(or at the very least a less freeform text protocol that what we have now.)</blockquote><div><br>Right, I haven't honestly done any significant profiling -- and if command parsing is the slowest part.. well.. Ugh.
</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Obviously both can be supported. But it's something to be aware of: sending<br>
the server a bunch of commands that take even more time than the current<br>protocol's to parse will probably have a much bigger than expected negative<br>impact on memcached's CPU efficiency.</blockquote><div><br>
In general, I disagree -- I don't like having multiple protocol interfaces to one thing like memcached -- all it means is various clients will only implement one of X possible protocol formats, all with different feature sets, creating a massive mess on the client side of memcached, something that isn't often discussed here.
<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">As an aside, if you have a robust HTTP server running inside memcached, the<br>temptation will be to use it for benchmarking,
e.g., by running "ab" against<br>a memcached instance. Of course memcached isn't going to be *slow* in such a<br>benchmark, but you will be tying memcached's hands behind its back in some</blockquote><div>
<br>Bleh, Idiots will continue to do stupid blog posts with ab over local host gave them an amazing 10,000 requests per second. Just try to ignore them and don't let them drive the technical decisions on a project.<br>
</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">sense: you will be using the lowest performance interface available (and the<br>interface is the single most expensive part of the code, so that's
<br>significant) and you will, most likely, be giving up stuff like multi-key<br>"get" requests, which actually turn out to be a huge efficiency win. I bet<br>it won't take long for the first "I thought memcached was supposed to be
<br>fast, but look at these mediocre numbers from ab!" blog post.</blockquote><div><br>There is no reason we would have to give up multi-gets. They could be implemened as a new/custom HTTP Method, there is nothing forbidding that.
</div><br>I wouldn't be so quick to dismiss HTTP.. There are some interesting ideas on how to do async parsing of the protocol in projects like serf: <a href="http://code.google.com/p/serf/">http://code.google.com/p/serf/
</a><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">One other thought: since, if you buy what I said above, HTTP is not going to<br>
be the choice of the performance-sensitive, maybe it makes sense to consider<br>implementing the HTTP support as a completely separate frontend that acts as<br>a proxy to a memcached instance that speaks the high-speed protocol. I am
<br>not sure that's actually a good idea but I thought I'd toss it out there.</blockquote><div><br>I think there is a more interesting thought that this is leading to -- that memcached should be more modular. In the last couple weeks people have brought up having multiple backend storage methods. Perhaps we should look at making a *compile* time selection system for both a frontend command parser and a backend storage system?
<br><br>-Paul<br><br> <br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">-Steve<br><br><br>On 7/7/07 9:11 PM, "Paul Querna" <
<a href="mailto:chip@corelands.com">chip@corelands.com</a>> wrote:<br><br>> Paul Querna wrote:<br>>> Dustin Sallings wrote:<br>>>> There'd be indexing overhead, but you could have an O(1)<br>>>> invalidation if the tags themselves were versioned.
<br>>>><br>>>> Assuming the cache time is short or you're accessing these records,<br>>>> cleanup should pretty much take care of itself.<br>>>><br>>>> Protocol-wise, would it make sense to have the tags be additional
<br>>>> tokens on the mutation line? i.e.:<br>>>><br>>>> <command name> <key> <flags> <exptime> <bytes> [<tag> [...]]\r\n<br>>><br>>><br>>> Well, it does bring up a wider issue of protocol versioning....
<br>>><br>>> I was thinking about a more generic structure, if we ever did a protocol<br>>> revamp, something like:<br>>><br>>> <command>\n<br>>> <meta>=<string||int>\n
<br>>> data=<data>\m<br>>> END<br>>><br>>> So, for example a SET today would be like:<br>>> SET<br>>> key=foobar<br>>> flags=400<br>>> bytes=1000<br>>> data=...data...
<br>><br>> And, thinking about that syntax a little more, we are just reinventing<br>> HTTP, so, why not make memcached's protocol v2 just be HTTP :-) ?<br>><br>> -Paul<br></blockquote></div><br>