binary protocol notes from the facebook hackathon

Brad Fitzpatrick brad at danga.com
Tue Jul 10 16:23:25 UTC 2007


Last night at the Facebook memcached hackathon (a wonderful event, btw!),
a bunch of us server and client authors got in a room and discussed the
oft-requested memcached binary protocol in depth for some time.

After a dozen false-starts or backtracks, we finally arrived at something
the whole room was surprisingly happy with that seemed to solve all our
prior objections and complications.

This may not be perfect, but we'd like to solicit community feedback:

http://code.sixapart.com/svn/memcached/trunk/server/doc/binary-protocol-plan.jpg
http://code.sixapart.com/svn/memcached/trunk/server/doc/binary-protocol-plan.txt

The text write-up is included below, for ease of quoting in questions.

I'm sure my notes are missing details, so feel free to point out
omissions, whether or not you were there last night or not.  I'll update
the text file in svn appropriately, as well as reply to the list.

It was also mutually agreed that:

   * the binary protocol would be the one most "core" implemented in
     the server, performance being most important

   * the ASCII protocol will live on, unchanged, but will be moved
     into protocol_ascii_compat.c, or similar.

   * we're all willing to take a speed-hit on ASCII protocol if we
     have to (may not be needed), if the compat code isn't quite
     as efficient as it is now.  code readability/maintainability
     more important.  plus our time making binary protocol faster
     more important.  ASCII viewed as debugging aid, or for low-CPU
     solution for old clients.

   * will most likely add a HTTP protocol as well, implemented in
     protocol_http.c or something.  not to be exposed to the world,
     but still a lot of use cases for internal-network-only.

With that, the notes...

                                ---------

Notes regarding the proposed binary protocol from Facebook's hosted
memcached hackathon on 2007-07-09:

REQUEST STRUCTURE:

  * Magic byte / version
  * Cmd byte
  * Key len byte  (if no key, 0)
  * Reserved byte (should be 0)

  * 4 byte opaque id.  (will be copied back in response; means nothing to server)

  * 4 byte body length (network order; not including 12 byte header)

  [ cmd-specific fixed-width fields ]

  * key, if key length above is non-zero.

  [ cmd-specific variable-width field ]


RESPONSE STRUCTURE:

  * Magic byte / version (different from req's magic byte/version, to distinguish
    that it's a response for, say, protocol analyzers)
  * cmd byte (same as response it goes to)
  * err code byte (0 on success, else errcode.  hit bit set if fatal/non-normal error)
  * Reserved byte (should be 0)

  * 4 byte opaque id copied back from response

  * 4 byte body length (network order; not including 12 byte header)

  [cmd-specific body]


COMMANDS:  (for cmd byte)

  get    - single key get (no more multi-get; clients should pipeline)
  getq   - like get, but quiet.  that is, no cache miss, return nothing.

      Note: clients should implement multi-get (still important for
            reducing network roundtrips!) as n pipelined requests, the
            first n-1 being getq, the last being a regular
            get.  that way you're guaranteed to get a response, and
            you know when the server's done.  you can also do the naive
            thing and send n pipelined gets, but then you could potentially
            get back a lot of "NOT_FOUND!" error code packets.

  delete
  set/add/replace

       cmd-specific fixed-width fields for set/add/replace:

           * 4 byte expiration time
           * 4 byte flags
           (the 4 byte length is inferred from the total body length,
            subtracting (keylen + body length))




-- Brad



More information about the memcached mailing list