binary wire protocol in memcached

Dustin Sallings dustin at
Tue Sep 4 20:10:05 UTC 2007

On Sep 4, 2007, at 12:04 , Tony Tung wrote:

> 1) The magic byte is different, and also unique to request and  
> response.  We also maintain a notion of binary protocol versions  
> for possible future (albeit limited) expansion.

	It was decided that the request and response magic should be  
different.  I believe I added the definitions but didn't actually  
give them different values.

> 2) We specified the command IDs a bit differently, grouping them by  
> request/response type.

	What is the value of this?  Is it not more valuable to just list  
them sequentially so implementors can easily tell what the commands  
are without computing them?

> #define BP_QUIET           BIT(3)

	I suggested this in the first meeting and it was rejected.  The idea  
was to make commands (including quiet commands) with their specific  
semantics as needed.  For example, the desirable semantics of a quiet  
get are that there is no response body unless there's a success.  For  
a quiet set (if such a thing needs to exist), there should be no  
response body unless there's an error.

>     // these commands go as an empty_req and return as an empty_rep.
>     BP_ECHO_CMD        = (BP_E_E | FIELD(0x0, 0)),

	We had discussed both an echo (BP_S_S) and a noop (BP_E_E).  I  
didn't implement echo, but my noop has your echo semantics, it seems.

>     BP_SERVERERR_CMD   = (BP_E_S | FIELD(0x1, 0)), // this is  
> actually not a
>                                                    // command.   
> this is solely
>                                                    // used as a  
> response when
>                                                    // the server  
> wants to
>                                                    // indicate an  
> error status.

	When does the server indicate an error status that's not in response  
to a command?

>     // these commands go as a string_req and return as an empty_rep.
>     BP_FLUSH_REGEX_CMD = (BP_S_E | FIELD(0x0, 0)),

	Does this implementation exist?  People have been asking for it a lot.

>     // these commands go as a string_req and return as a string_rep.
>     BP_STATS_CMD       = (BP_S_S | FIELD(0x0, 0)),

	We discussed having more structure in the response to a stats  
command since there is structure in the stats result itself.  It  
makes sense to me to have the response be a list of sorts.

> 3) We have no support for 64-bit arithmetic commands.  We have  
> retained decr (so far).

	Decr was killed because it seemed mostly unnecessary and was  
semantically different (in implementation, anyway) from incr.  I  
fixed the latter bug in my tree and made sure the incr amount was  
unsigned with the 64-bit patch.

	Obviously, decr still exists in spirit, as my client interface  
hasn't changed due to binary protocol implementation.

> 4) The return value for arithmetic commands is binary and not a  
> string.  I wasn’t at the hackathon during which this was decided so  
> I’d be interested in finding out why a string is preferable.

	The string is required for the internal representation due to  
``get'' requests expecting ascii representations of the numbers.  As  
the arithmetic operations are already working on string values, it  
made sense to go ahead and return a value from incr the same way that  
the same value would be returned from a get.

	I suppose we could pack the results into a 64-bit number as well, by  
modifying add_delta to provide the counter to the response, but I  
don't think there's a lot to gain here.  As it is, I'm using an  
unmodified do_add_delta (well, with the 64-bit mods) for both text  
and bin.

	The other difference is that we're allowing counters to (optionally)  
be created using incr.  At least one of the users of counters  
indicated he's sending an add and incr together to ensure the counter  
exists.  It seems natural to put these things together.  Based on  
values in your packet, you can either expect a NOT_FOUND sort of  
response, or an auto-created counter when incrementing a missing value.

> Thanks,
> Tony
> On 8/27/07 8:39 PM, "Dustin Sallings" <dustin at> wrote:
>>         I just got memcached passing all of my tests with the  
>> binary wire
>> protocol (tcp).  Sorry it took me so long to get going on it.
>>         I've implemented this as a patch stack (using mercurial  
>> queues) over
>> trunk r608 with Evan Miller's 64-bit counter patch applied (since
>> everyone is excited by big numbers).  I've got no idea where (if
>> anywhere) active development is going on.  I can pick it up and move
>> it needed.
>>         I don't know what clients are ready.  I haven't done a  
>> release of my
>> java client with binary protocol support yet, but I can roll out a
>> pre-release if someone wants to try it (all of my interop tests pass
>> between my java client and memcached in binary mode, so I just need
>> to clean it up a bit).  My test python client is functional for at
>> least interactive use as well.
>>         Known issues:
>>         I didn't implement a timeout on delete.  I suppose I  
>> should, but do
>> people actually use that?
>>         I didn't implement a timeout on flush.  I'd rather avoid  
>> that one if
>> possible because the semantics are really confusing.
>>         I basically ignored managed buckets because I have no idea  
>> how they
>> work.
>>         My 64-bit ntohl kind of thing might not work on big endian  
>> machines
>> (I haven't tried it yet).  Moreover, I would hope something like that
>> would already exist somewhere, so I didn't bother making it fast.
>>         I've only implemented version, flush, noop, set, add,  
>> replace, get,
>> getq, delete, and incr.  Besides stats (for which we have no clearly
>> defined packet format), there seem to be other commands in there that
>> don't make much sense to me.
>>         There may be more, but I don't know about them yet.
>> --
>> Dustin Sallings

Dustin Sallings

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the memcached mailing list