mecached - text protocol?
Anatoly Vorobey
mellon@pobox.com
Thu, 15 Jul 2004 21:37:41 +0300
On Thu, Jul 15, 2004 at 09:17:28AM -0700, Brad Fitzpatrick wrote:
> Michal,
>
> A binary protocol would be a nice addition. In particular, the whitespace
> problem in keys bugs me as well. I want that fixed.
>
> And while we're happy with the current performance, I'm sure it could
> improve.
>
> Let's discuss both the format of the binary protocol and its
> implementation on this list. It should be very easy to add.
I disagree. There's a reason why all successful network protocols happen
to be text protocols. FTP, SMTP, NNTP, HTTP, you name it. And it wasn't
done this way to make testing with telnet easier.
This in particular:
> > process_command() function is horrorible, all these "if(strncmp...)" takes
> > some CPU and these are unnecessary.
is just ridiculous. These CPU cycles are are a negligible fraction of
what is spent reading/writing to the network, looking things up, or, for
that matter, calculating hash values! I mean, rewriting memcached's hash
function in optimised x86 assembly would "speed things up" 3-4 times more
than what we'd win by moving to a binary protocol and eliminating all the
horrible strcmp() calls. And the speedup, although 3-4 times larger,
would also be almost as negligible, because it's also far from being the
bottleneck.
Text protocols are easy to implement both on client and on server sides
(and on the client side, we have several different programming
languages, where working with text is always more straightforward).
They are, more importantly, easy to maintain, to debug and to extend.
Adding a new minor command is just a matter of sticking another if
clause on the server, and putting some text to the socket on the client.
In a binary format, you have to carefully allocate the command id, check
that the existing command struct is adequate for your needs, swear and
extend it for everyone if it isn't, match the numeric id->symbolic
name mapping on the client side, encode parameters, decode parameters
(matter of simple sprintf/sscanf or similar functions in a text
protocol), etc. etc. It just becomes too bothersome to try out and
experiment with stuff.
It's not that I'm happy with everything about the current protocol. It
has, for instance, a very painful drawback of not specifying the command
line's length in advance, so when the server reads data in, it can't
read in the exact number of bytes that's the command, it has to mingle
the command buffer and the data buffer at some point. A binary protocol
would help with that, but this reason wasn't even named so far. Things
like strcmp() cpu cycles are goofy reasons to switch to binary, not real
reasons. And the stuff I just mentioned doesn't really necessitate
a switch to a binary protocol either, it could be solved by prepending
each command line with its length (in decimal, padded to three bytes
with spaces on the left if necessary). Though it'd make telnetting
harder, so I'm not all that eager to propose it ;-)
Whitespace characters - I don't know about that. Do we really suffer
from not being able to use them in keys? Do other people on this list?
We could always have spaces but not \n's in keys and delimit them with
\n's.
--
avva
"There's nothing simply good, nor ill alone" -- John Donne