Binary Protocol...

Wed Dec 8 18:55:46 PST 2004

>> In the interests of feature growth and moving away from the  
>> convenient, but rather expensive text protocol, I'd like to propose 
>> the  binary memcache protocol.
>
> I'm not clear that the protocol is that expensive, or that it matters 
> terribly much.

Right now the protocol has no structure other than it's newline 
delimited.  Fantastic for telnet sessions, but it's hard to extend.  In 
the current protocol, the solution is to add new commands or add 
additional flags at the end of the command (ie: 'set foo 0 1 1 2 5 6 1 
2 4 5', etc).  HTTP at least has some structure, memcached at the 
moment does not.  Moving things to a binary protocol gives structure 
and the ability to have arbitrary keys and values.  With the binary 
protocol, you could have newline characters in your keys and it 
wouldn't matter.  That peace of mind is huge, IMHO.

> Are your servers or users CPU-bound?

No, but when profiling, most of the time doing memcache related stuff 
is spent parsing responses.  With a binary protocol, that will be 
reduced to the lowest possible level.

> Is all that much CPU used in the parsing of the protocol?

Well, in my benchmarking routines, 60% of the time of the library is 
spent doing string handling... and libmemcache(3) is pretty quick about 
its parsing.  That said, do I think someone is CPU bound who's using 
libmemcache(3)?  Absolutely not.  But a text protocol is fundamentally 
limited by the characteristics of the agreed upon text protocol (can't 
use colons, newlines, etc...).  A binary protocol only leaves us with 
size limitations, which we had earlier anyway.

> Are they network bound, and if so, is the protocol overhead really 
> that much more then the data you're slinging about?  Remember that all 
> the techniques for forcing things into one packet -- disabling 
> Nangle's Algo, all that jazz -- are available with textual protocols 
> too.

I don't think it's much, but I don't want to see it grow.  My point was 
I'm staying within the single packet per trip paradigm that 
memcached(8) currently enjoys.  Some binary protocols are chatty and I 
was making a statement that I'm explicitly avoiding that.

> Text-based protocols are easier to debug, and they're easier to extend 
> by multiple people without them stepping on each-other's toes.

Heh, easier to debug: not to extend, IMHO.

>> The HELLO Packet:
> I'd rather refer to these as "message", and make explicit that you can 
> have more then one of them in a TCP/IP packet.

This packet only gets sent when a connection is established.  The HELLO 
Packet authenticates the connection, but never gets sent after the 
connection is established.

>>  0                   1                   2                   3
>>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |    Version    |    Options    |  User Length  | Passwd Length |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |                         Key Space ID                          |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> /                           Username                            /
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> /                           Password                            /
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> I'd prefer to see each length field go immediately before the thing 
> that it's counting, and all length fileds be the same size.  (We 
> probably don't need an explicit statement of endianness, but it 
> couldn't hurt.)

I'm trying to keep the headers 32bit aligned where possible that way I 
can do tricks like reading this into a structure.

> This may seem bikeshedish, but it allows for reuse of routines to pack 
> and unpack them into the languge's native strings.  In perl, it even 
> allows using a single pack/unpack function call.

At some point, an XS wrapper around libmemcache(3) will probably spring 
into existence.  Convenience for high level languages isn't my concern.

>> Options (required):
>>     These bits refer to the bits in the Options Byte.
>>     Bit 0:    Connection provides authentication information
>>     Bit 1:    This client connection requires TLS
>>     Bit 2:    Disconnect if TLS can not be negotiated
>>     Bit 3-7:    Not designated
> What's the difference between 1 and 2?

It's the difference between "I want TLS if you offer that service" and 
"I won't talk to you if I can't connect over TLS."

> Why have 0 different from just a 0-length username and passwd?

In closed networks, there's no need to pass authentication information 
around, like what memcached(8) does now.  The username and password are 
optional.  Note the '/' to the sides of the Username/Password fields.

> What are you doing running memcached across a sniffable network, 
> anyway?  Doesn't using TLS add in overhead more then enough to nullify 
> any help that a binary protocal would help?

Absolutely!  Don't think for a second that I'll be caught dead using 
TLS in production, but for those who can dream up a need, at least the 
protocol has support for it.  One example application would be network 
appliances authenticating over wireless.  memcached(8) + TLS + 
pgmemcache(1) to invalidate the auth bits == way more cool than radius 
or ldap.  As I said earlier, just because the protocol has support for 
it doesn't mean there will be a feature to back it up.

>>     If Bit 2    of the Options 1 Byte is set, this value specifies the
>>     expiration of a key in seconds from the Epoch.
> Either "from the UNIX Epoch" or "in seconds since Dec 31, 1969 at 
> 23:59:59 GMT", please.

This lets us specify a relative vs absolute time using relative 
expiration times greater than a month.  Not sure what your concern here 
is...

>> Options 1 (required):
> Auxilury actions:
>>     These bits refer to the bits in the Options 1 Byte.
>>     Bit 0:    If this key exists and has a relative expiration, reset
>>             the expiration to be be relative to the current time.
>>     Bit 1:    Request that the server delete the key after sending the
>>             value to the client.
>>     Bit 2:    After the server has processed this request, close the
>>             connection.
>>     Bit 3:    If the key exists, include the expiration of the key in
>>             the response from the server.
>>     Bit 4:    If the key exists, include the number of fetch requests
>>             left for this key.
>>     Bit 5-7:    Not designated
> Bit 5: do not return the data, only do the other actions in the 
> auxilury actions byte.

This is the HELLO Packet and is only transmitted once.  This bit should 
be added to the options below.

>> Key (required):
>>     The key for the given request.  Keys are not padded by a null  
>> character.
> There is a certian danger in allowing the user to specify keys that 
> cannot be retreived by the normal (textual) protocol.  I'm really not 
> sure if we should say "you get what you deserve, then", or dissallow 
> it.  (For that matter, I can't quite recall if there really is such a 
> beast.)

Well, right now spaces are fatal in keys.  This removes that 
restriction.  Being able to treat keys as blobs of data is handy.

>> The ERROR Packet:
>> The ERROR Packet is one of the ways a server responds to client  
>> requests.  Not all ERROR Packets are fatal errors and indeed, the  
>> server responds with an ERROR Packet after a STORE Packet has been  
>> processed by the server.
> I'm not sure this is a good idea.  Shouldn't we imply good by the lack 
> of an error packet, if we wish to be efficent?

Some messages respond with a RESPONSE Packet (what I'm thinking about 
renaming to the DATA Packet), but all commands give some kind of 
feedback.  A lack of a response is not acceptable.  As I said at the 
bottom, I'm tempted to rename this packet to the RESPONSE Packet, but 
the point remains the same: some kind of acknowledgment packet always 
needs to be sent back.  The client relies on a write(2) then a read(2) 
for any memcache function to succeed and I see no reason to change 
that.

It may be interesting to goto an asynchronous model (from a purely 
academic approach), but I can't see any benefits of such an approach.  
PostgreSQL does that for its pq(4) protocol and in libpq(3), and I find 
it to be only useful for consuming userland CPU cycles.  If you need 
asynchronous behavior, use pthreads and wrap the blocking nature of 
memcache in a condition variable.  Fire and forget would only work for 
setting data, but since most memcache installations are used for 
read's, I can't see a benefit here.

>> Additional Notes:
>> If a client connects and sends an invalid request that is out of 
>> bounds  for the protocol, the server with a plain text error message 
>> and closes  the connection.  The format for the plain text error 
>> response is:
>> ERROR [code]: [message]\n
>> [custom message]\n
>> <server closes connection>
> I hope this just got in this spec by accident -- haven't we already 
> covered this with the error packet?

It is possible for buggy clients or servers to get out of sync with 
what the server thinks should happen.  If that happens, the client 
takes the last bit of data read from the server, searches back until it 
finds the 2nd to last newline and it is able to come up with an error 
message even if things get out of sync.  -sc

-- 
Sean Chittenden