mecached - text protocol?

Anatoly Vorobey mellon@pobox.com
Thu, 15 Jul 2004 21:37:41 +0300


On Thu, Jul 15, 2004 at 09:17:28AM -0700, Brad Fitzpatrick wrote:
> Michal,
> 
> A binary protocol would be a nice addition.  In particular, the whitespace
> problem in keys bugs me as well.  I want that fixed.
> 
> And while we're happy with the current performance, I'm sure it could
> improve.
> 
> Let's discuss both the format of the binary protocol and its
> implementation on this list.  It should be very easy to add.

I disagree. There's a reason why all successful network protocols happen 
to be text protocols. FTP, SMTP, NNTP, HTTP, you name it. And it wasn't 
done this way to make testing with telnet easier. 

This in particular:

> > process_command() function is horrorible, all these "if(strncmp...)" takes
> > some CPU and these are unnecessary.

is just ridiculous. These CPU cycles are are a negligible fraction of 
what is spent reading/writing to the network, looking things up, or, for 
that matter, calculating hash values! I mean, rewriting memcached's hash 
function in optimised x86 assembly would "speed things up" 3-4 times more 
than what we'd win by moving to a binary protocol and eliminating all the 
horrible strcmp() calls. And the speedup, although 3-4 times larger, 
would also be almost as negligible, because it's also far from being the 
bottleneck. 

Text protocols are easy to implement both on client and on server sides 
(and on the client side, we have several different programming 
languages, where working with text is always more straightforward). 
They are, more importantly, easy to maintain, to debug and to extend. 
Adding a new minor command is just a matter of sticking another if 
clause on the server, and putting some text to the socket on the client. 
In a binary format, you have to carefully allocate the command id, check 
that the existing command struct is adequate for your needs, swear and 
extend it for everyone if it isn't, match the numeric id->symbolic 
name mapping on the client side, encode parameters, decode parameters 
(matter of simple sprintf/sscanf or similar functions in a text 
protocol), etc. etc. It just becomes too bothersome to try out and 
experiment with stuff.

It's not that I'm happy with everything about the current protocol. It 
has, for instance, a very painful drawback of not specifying the command 
line's length in advance, so when the server reads data in, it can't 
read in the exact number of bytes that's the command, it has to mingle 
the command buffer and the data buffer at some point. A binary protocol 
would help with that, but this reason wasn't even named so far. Things 
like strcmp() cpu cycles are goofy reasons to switch to binary, not real 
reasons. And the stuff I just mentioned doesn't really necessitate 
a switch to a binary protocol either, it could be solved by prepending 
each command line with its length (in decimal, padded to three bytes 
with spaces on the left if necessary). Though it'd make telnetting 
harder, so I'm not all that eager to propose it ;-)

Whitespace characters - I don't know about that. Do we really suffer 
from not being able to use them in keys? Do other people on this list?
We could always have spaces but not \n's in keys and delimit them with 
\n's.

-- 
avva
"There's nothing simply good, nor ill alone" -- John Donne