binary wire protocol in memcached

Tony Tung ttung at facebook.com
Tue Sep 4 19:04:59 UTC 2007


Hi,

We have a working binary wire protocol implementation (client and server)
though it does not reflect the changes that occurred at the last hackathon
(at Sun).  Some key differences:

1) The magic byte is different, and also unique to request and response.  We
also maintain a notion of binary protocol versions for possible future
(albeit limited) expansion.

// version number and magic byte.
#define BP_VERSION         0x0
#define BP_REQ_MAGIC_BYTE  (0x50 | BP_VERSION)
#define BP_REP_MAGIC_BYTE  (0xA0 | BP_VERSION)

2) We specified the command IDs a bit differently, grouping them by
request/response type.

#define BIT(x)             (1ULL << (x))
#define FIELD(val, shift)  ((val) << (shift))

#define BP_E_E             FIELD(0x0, 4)
#define BP_E_S             FIELD(0x1, 4)
#define BP_K_V             FIELD(0x2, 4)
#define BP_KV_E            FIELD(0x3, 4)
#define BP_KN_E            FIELD(0x4, 4)
#define BP_KN_N            FIELD(0x5, 4)
#define BP_N_E             FIELD(0x6, 4)
#define BP_S_E             FIELD(0x7, 4)
#define BP_S_S             FIELD(0x8, 4)

#define BP_QUIET           BIT(3)

typedef enum bp_cmd {
    // these commands go as an empty_req and return as an empty_rep.
    BP_ECHO_CMD        = (BP_E_E | FIELD(0x0, 0)),
    BP_QUIT_CMD        = (BP_E_E | FIELD(0x1, 0)),

    // these commands go as an empty_req and return as a string_rep.
    BP_VER_CMD         = (BP_E_S | FIELD(0x0, 0)),
    BP_SERVERERR_CMD   = (BP_E_S | FIELD(0x1, 0)), // this is actually not a
                                                   // command.  this is
solely
                                                   // used as a response
when
                                                   // the server wants to
                                                   // indicate an error
status.

    // these commands go as a key_req and return as a value_rep.
    BP_GET_CMD         = (BP_K_V | FIELD(0x0, 0)),
    BP_GETQ_CMD        = (BP_K_V | BP_QUIET | FIELD(0x0, 0)),

    // these commands go as a key_value_req and return as an empty_rep.
    BP_SET_CMD         = (BP_KV_E | FIELD(0x0, 0)),
    BP_ADD_CMD         = (BP_KV_E | FIELD(0x1, 0)),
    BP_REPLACE_CMD     = (BP_KV_E | FIELD(0x2, 0)),
    BP_APPEND_CMD      = (BP_KV_E | FIELD(0x3, 0)),

    BP_SETQ_CMD        = (BP_KV_E | BP_QUIET | FIELD(0x0, 0)),
    BP_ADDQ_CMD        = (BP_KV_E | BP_QUIET | FIELD(0x1, 0)),
    BP_REPLACEQ_CMD    = (BP_KV_E | BP_QUIET | FIELD(0x2, 0)),
    BP_APPENDQ_CMD     = (BP_KV_E | BP_QUIET | FIELD(0x3, 0)),

    // these commands go as a key_number_req and return as an empty_rep.
    BP_DELETE_CMD      = (BP_KN_E | FIELD(0x0, 0)),
    BP_DELETEQ_CMD     = (BP_KN_E | BP_QUIET | FIELD(0x0, 0)),

    // these commands go as a key_number_req and return as a number_rep.
    BP_INCR_CMD        = (BP_KN_N | FIELD(0x0, 0)),
    BP_DECR_CMD        = (BP_KN_N | FIELD(0x1, 0)),

    // these commands go as a number_req and return as an empty_rep.
    BP_FLUSH_ALL_CMD   = (BP_N_E | FIELD(0x0, 0)),

    // these commands go as a string_req and return as an empty_rep.
    BP_FLUSH_REGEX_CMD = (BP_S_E | FIELD(0x0, 0)),

    // these commands go as a string_req and return as a string_rep.
    BP_STATS_CMD       = (BP_S_S | FIELD(0x0, 0)),
} bp_cmd_t;

3) We have no support for 64-bit arithmetic commands.  We have retained decr
(so far).

4) The return value for arithmetic commands is binary and not a string.  I
wasn¹t at the hackathon during which this was decided so I¹d be interested
in finding out why a string is preferable.

Thanks,
Tony

On 8/27/07 8:39 PM, "Dustin Sallings" <dustin at spy.net> wrote:

> 
> 
>         I just got memcached passing all of my tests with the binary wire
> protocol (tcp).  Sorry it took me so long to get going on it.
> 
>         I've implemented this as a patch stack (using mercurial queues) over
> trunk r608 with Evan Miller's 64-bit counter patch applied (since
> everyone is excited by big numbers).  I've got no idea where (if
> anywhere) active development is going on.  I can pick it up and move
> it needed.
> 
>         I don't know what clients are ready.  I haven't done a release of my
> java client with binary protocol support yet, but I can roll out a
> pre-release if someone wants to try it (all of my interop tests pass
> between my java client and memcached in binary mode, so I just need
> to clean it up a bit).  My test python client is functional for at
> least interactive use as well.
> 
> 
>         Known issues:
> 
>         I didn't implement a timeout on delete.  I suppose I should, but do
> people actually use that?
> 
>         I didn't implement a timeout on flush.  I'd rather avoid that one if
> possible because the semantics are really confusing.
> 
>         I basically ignored managed buckets because I have no idea how they
> work.
> 
>         My 64-bit ntohl kind of thing might not work on big endian machines
> (I haven't tried it yet).  Moreover, I would hope something like that
> would already exist somewhere, so I didn't bother making it fast.
> 
>         I've only implemented version, flush, noop, set, add, replace, get,
> getq, delete, and incr.  Besides stats (for which we have no clearly
> defined packet format), there seem to be other commands in there that
> don't make much sense to me.
> 
>         There may be more, but I don't know about them yet.
> 
> --
> Dustin Sallings
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070904/75abcc11/attachment-0001.html


More information about the memcached mailing list