Binary protocol questions

Tomash Brechko tomash.brechko at gmail.com
Sun Nov 11 11:30:26 UTC 2007


On Sun, Nov 11, 2007 at 12:22:20 +0300, Tomash Brechko wrote:
> Binary protocol is there for someone, while text protocol is here for
> everyone.

AFAIU, this will change, and eventually binary protocol will be
accepted to the mainline.  Little is known about it, but from the
binary-protocol-plan.txt (and it was said that it's a bit outdated) we
may see:

  Notes regarding the proposed binary protocol from Facebook's hosted
  memcached hackathon on 2007-07-09:

  REQUEST STRUCTURE:

    * Magic byte / version
    * Cmd byte
    ...

But what if the 'version' will become 0x61 ('a')?  How the server will
distinguish between text and binary protocols then?  Can't you prepend
another byte that will solve this (zero for instance)?  Some may
think, "but why anyone would use text protocol when I use binary?",
but the answer is in this doc file already:

  RESPONSE STRUCTURE:

    * Magic byte / version (different from req's magic byte/version,
      to distinguish that it's a response for, say, protocol
      analyzers)


But memcached is one of zillions of projects, it will require long
time until all of its protocol versions will be incorporated into all
packet analyzers (if this will happen at all).  So having working text
protocol might be useful.

As you understand, any protocol have syntax and semantics, and text vs
binary is _purely syntactic_ issue.  More than that, if both evolve
separately, then text protocol with thought-out semantics may
outperform binary with a poor one (parsing of text is slower, but
number of commands, and thus key lookups, to do useful things might be
less).  Shouldn't the semantics first be defined, and only then its
encoding in text or binary form?

For the example of semantic decision we may take this: "'replace'
command replaces the data for the given key if this key is present in
the cache.  It may also update meta-data, like expiration time and
flags."  Note that this doesn't enforce you to _always_ set exptime
and flags, which you may not know, so it is very flexible and
_efficient_: you don't have to query for old values just to set them
so.

Once the semantics of the whole protocol is defined, it's time to
think about its text and binary syntax.  And because the above implies
optional parameters, both syntax should support these.  Then you
implement such syntax, and do the benchmarks.  If you find that
parsing of varying-width commands is indeed an issue, only then you
add more fixed-width commands for frequent use (leaving varying-width
commands in place, of course).

This is what hackathon should have produce.


So can someone calm me on my fears:

  - binary protocol will _replace_ text protocol.

  - binary protocol will repeat some of the shortcomings of the
    current text protocol (like mandatory exptime and flags).

  - binary protocol will be developed elsewhere, and then pushed to
    the mainline on the basis "Works for us!".


Though it may sound like the continuation of the 'noreply' fight, it
is not.  I too want to have binary protocol, but not as a
_replacement_ of the text protocol, and definitely not a broken one.


-- 
   Tomash Brechko


More information about the memcached mailing list