On 7/7/07, <b class="gmail_sendername">Steve Grimm</b> &lt;<a href="mailto:sgrimm@facebook.com">sgrimm@facebook.com</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

There&#39;s a tension here: if you look at profiling data, command parsing<br>actually turns out to be the single most expensive part of the code. That is<br>still true after several rounds of optimization of the parser. So we

<br>certainly want a nice verbose extensible human-readable protocol, but we<br>also need to move in the opposite direction: a protocol that can be parsed<br>just about for free. That is almost certainly going to be a binary protocol

<br>(or at the very least a less freeform text protocol that what we have now.)</blockquote><div><br>Right, I haven&#39;t honestly done any significant profiling -- and if command parsing is the slowest part.. well.. Ugh.&nbsp;

</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Obviously both can be supported. But it&#39;s something to be aware of: sending<br>

the server a bunch of commands that take even more time than the current<br>protocol&#39;s to parse will probably have a much bigger than expected negative<br>impact on memcached&#39;s CPU efficiency.</blockquote><div><br>

In general, I disagree -- I don&#39;t like having multiple protocol interfaces to one thing like memcached -- all it means is various clients will only implement one of X possible protocol formats, all with different feature sets, creating a massive mess on the client side of memcached, something that isn&#39;t often discussed here.

<br>&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">As an aside, if you have a robust HTTP server running inside memcached, the<br>temptation will be to use it for benchmarking, 

e.g., by running &quot;ab&quot; against<br>a memcached instance. Of course memcached isn&#39;t going to be *slow* in such a<br>benchmark, but you will be tying memcached&#39;s hands behind its back in some</blockquote><div>

<br>Bleh, Idiots will continue to do stupid blog posts with ab over local host gave them an amazing 10,000 requests per second. Just try to ignore them and don&#39;t let them drive the technical decisions on a project.<br>

</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">sense: you will be using the lowest performance interface available (and the<br>interface is the single most expensive part of the code, so that&#39;s

<br>significant) and you will, most likely, be giving up stuff like multi-key<br>&quot;get&quot; requests, which actually turn out to be a huge efficiency win. I bet<br>it won&#39;t take long for the first &quot;I thought memcached was supposed to be

<br>fast, but look at these mediocre numbers from ab!&quot; blog post.</blockquote><div><br>There is no reason we would have to give up multi-gets.&nbsp; They could be implemened as a new/custom HTTP Method, there is nothing forbidding that.&nbsp;

</div><br>I wouldn&#39;t be so quick to dismiss HTTP.. There are some interesting ideas on how to do async parsing of the protocol in projects like serf: <a href="http://code.google.com/p/serf/">http://code.google.com/p/serf/

</a><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">One other thought: since, if you buy what I said above, HTTP is not going to<br>

be the choice of the performance-sensitive, maybe it makes sense to consider<br>implementing the HTTP support as a completely separate frontend that acts as<br>a proxy to a memcached instance that speaks the high-speed protocol. I am

not sure that&#39;s actually a good idea but I thought I&#39;d toss it out there.</blockquote><div> I think there is a more interesting thought that this is leading to -- that memcached should be more modular.&nbsp; In the last couple weeks people have brought up having multiple backend storage methods.&nbsp; Perhaps we should look at making a *compile* time selection system for both a frontend command parser and a backend storage system?

<br><br>-Paul<br><br>&nbsp;<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">-Steve<br><br><br>On 7/7/07 9:11 PM, &quot;Paul Querna&quot; &lt;

<a href="mailto:chip@corelands.com">chip@corelands.com</a>&gt; wrote: &gt; Paul Querna wrote: &gt;&gt; Dustin Sallings wrote: &gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; There&#39;d be indexing overhead, but you could have an O(1) &gt;&gt;&gt; invalidation if the tags themselves were versioned.

<br>&gt;&gt;&gt;<br>&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Assuming the cache time is short or you&#39;re accessing these records,<br>&gt;&gt;&gt; cleanup should pretty much take care of itself.<br>&gt;&gt;&gt;<br>&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Protocol-wise, would it make sense to have the tags be additional

&gt;&gt;&gt; tokens on the mutation line?&nbsp;&nbsp;i.e.: &gt;&gt;&gt; &gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; &lt;command name&gt; &lt;key&gt; &lt;flags&gt; &lt;exptime&gt; &lt;bytes&gt; [&lt;tag&gt; [...]]\r\n &gt;&gt; &gt;&gt; &gt;&gt; Well, it does bring up a wider issue of protocol versioning....

<br>&gt;&gt;<br>&gt;&gt; I was thinking about a more generic structure, if we ever did a protocol<br>&gt;&gt; revamp, something like:<br>&gt;&gt;<br>&gt;&gt; &lt;command&gt;\n<br>&gt;&gt; &lt;meta&gt;=&lt;string||int&gt;\n

&gt;&gt; data=&lt;data&gt;\m &gt;&gt; END &gt;&gt; &gt;&gt; So, for example a SET today would be like: &gt;&gt; SET &gt;&gt; key=foobar &gt;&gt; flags=400 &gt;&gt; bytes=1000 &gt;&gt; data=...data...

<br>&gt;<br>&gt; And, thinking about that syntax a little more, we are just reinventing<br>&gt; HTTP, so, why not make memcached&#39;s protocol v2 just be HTTP :-) ?<br>&gt;<br>&gt; -Paul<br></blockquote></div><br>