Binary Protocol...

Wed Dec 8 20:19:09 PST 2004

I'm at home, using outlook web access so it refuses to properly quote, but see my inline comments below anyway...

________________________________

From: memcached-bounces at lists.danga.com on behalf of Sean Chittenden
Sent: Wed 12/8/2004 6:55 PM
To: James Mastros
Cc: memcached at lists.danga.com
Subject: Re: Binary Protocol...

>> In the interests of feature growth and moving away from the 
>> convenient, but rather expensive text protocol, I'd like to propose
>> the  binary memcache protocol.
>
> I'm not clear that the protocol is that expensive, or that it matters
> terribly much.
>
> Right now the protocol has no structure other than it's newline
> delimited.  Fantastic for telnet sessions, but it's hard to extend.  In
> the current protocol, the solution is to add new commands or add
> additional flags at the end of the command (ie: 'set foo 0 1 1 2 5 6 1
> 2 4 5', etc).  HTTP at least has some structure, memcached at the
> moment does not.  Moving things to a binary protocol gives structure
> and the ability to have arbitrary keys and values.  With the binary
> protocol, you could have newline characters in your keys and it
> wouldn't matter.  That peace of mind is huge, IMHO.
>
> Are your servers or users CPU-bound?

My servers are CPU bound, we have a very complex database where we log all kinds of statistical data.  There is a cluster of servers dedicated to perform aggregation and statistical analysis.  We are using memcache to eliminate the database as the bottleneck, and after this is done the actual CPU usage of the server (quad proc xeon boxes atm) is the limiting factor.  As such whatever ways we can lower CPU usage are important to me.  

Libmemcache is a small % vs the actual computation, but anything helps, and when its a simple obviously good step like moving to an easier to extend and use binary protocol I see it as a no brainer.

>
> No, but when profiling, most of the time doing memcache related stuff
> is spent parsing responses.  With a binary protocol, that will be
> reduced to the lowest possible level.
>
> Is all that much CPU used in the parsing of the protocol?
>
> Well, in my benchmarking routines, 60% of the time of the library is
> spent doing string handling... and libmemcache(3) is pretty quick about
> its parsing.  That said, do I think someone is CPU bound who's using
> libmemcache(3)?  Absolutely not.  But a text protocol is fundamentally
> limited by the characteristics of the agreed upon text protocol (can't
> use colons, newlines, etc...).  A binary protocol only leaves us with
> size limitations, which we had earlier anyway.
>
> Are they network bound, and if so, is the protocol overhead really
> that much more then the data you're slinging about?  Remember that all
> the techniques for forcing things into one packet -- disabling
> Nangle's Algo, all that jazz -- are available with textual protocols
> too.
>
> I don't think it's much, but I don't want to see it grow.  My point was
> I'm staying within the single packet per trip paradigm that
> memcached(8) currently enjoys.  Some binary protocols are chatty and I
> was making a statement that I'm explicitly avoiding that.
>
> Text-based protocols are easier to debug, and they're easier to extend
> by multiple people without them stepping on each-other's toes.
> 
> Heh, easier to debug: not to extend, IMHO.
>

I don't even consider them easier to debug if I'm working in c.  For high level languages sure, but they are also clearly not geared towards performance.  The entire point of memcached is performance, a lot of users don't need it if they are just using memcached to cache some data for a web server, but thats not all memcached is good for.  As for easier to extend, I think its a toss up either way.

>> The HELLO Packet:
> I'd rather refer to these as "message", and make explicit that you can
> have more then one of them in a TCP/IP packet.
>
> This packet only gets sent when a connection is established.  The HELLO
> Packet authenticates the connection, but never gets sent after the
> connection is established.
>
>>  0                   1                   2                   3
>>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |    Version    |    Options    |  User Length  | Passwd Length |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |                         Key Space ID                          |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> /                           Username                            /
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> /                           Password                            /
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> I'd prefer to see each length field go immediately before the thing
> that it's counting, and all length fileds be the same size.  (We
> probably don't need an explicit statement of endianness, but it
> couldn't hurt.)
>
> I'm trying to keep the headers 32bit aligned where possible that way I
> can do tricks like reading this into a structure.
>
> This may seem bikeshedish, but it allows for reuse of routines to pack
> and unpack them into the languge's native strings.  In perl, it even
> allows using a single pack/unpack function call.
>
> At some point, an XS wrapper around libmemcache(3) will probably spring
> into existence.  Convenience for high level languages isn't my concern.
>

Given that memcached is designed to be a very fast cache to improve performance, I think that making it high performance in a language like c should be the highest priority.  It's still going to be very trivial to write wrappers and use memcached from high level languages.  I don't see any issue here.

>> Options (required):
>>     These bits refer to the bits in the Options Byte.
>>     Bit 0:    Connection provides authentication information
>>     Bit 1:    This client connection requires TLS
>>     Bit 2:    Disconnect if TLS can not be negotiated
>>     Bit 3-7:    Not designated
> What's the difference between 1 and 2?
>
> It's the difference between "I want TLS if you offer that service" and
> "I won't talk to you if I can't connect over TLS."
>
> Why have 0 different from just a 0-length username and passwd?
>
> In closed networks, there's no need to pass authentication information
> around, like what memcached(8) does now.  The username and password are
> optional.  Note the '/' to the sides of the Username/Password fields.
>
> What are you doing running memcached across a sniffable network,
> anyway?  Doesn't using TLS add in overhead more then enough to nullify
> any help that a binary protocal would help?
>
> Absolutely!  Don't think for a second that I'll be caught dead using
> TLS in production, but for those who can dream up a need, at least the
> protocol has support for it.  One example application would be network
> appliances authenticating over wireless.  memcached(8) + TLS +
> pgmemcache(1) to invalidate the auth bits == way more cool than radius
> or ldap.  As I said earlier, just because the protocol has support for
> it doesn't mean there will be a feature to back it up.
>
>>     If Bit 2    of the Options 1 Byte is set, this value specifies the
>>     expiration of a key in seconds from the Epoch.
> Either "from the UNIX Epoch" or "in seconds since Dec 31, 1969 at
> 23:59:59 GMT", please.
>
> This lets us specify a relative vs absolute time using relative
> expiration times greater than a month.  Not sure what your concern here
> is...
>

Agreed, specifying absolute or relative should be supported, both are useful.  I don't see any issue with having both.

>> Options 1 (required):
> Auxilury actions:
>>     These bits refer to the bits in the Options 1 Byte.
>>     Bit 0:    If this key exists and has a relative expiration, reset
>>             the expiration to be be relative to the current time.
>>     Bit 1:    Request that the server delete the key after sending the
>>             value to the client.
>>     Bit 2:    After the server has processed this request, close the
>>             connection.
>>     Bit 3:    If the key exists, include the expiration of the key in
>>             the response from the server.
>>     Bit 4:    If the key exists, include the number of fetch requests
>>             left for this key.
>>     Bit 5-7:    Not designated
> Bit 5: do not return the data, only do the other actions in the
> auxilury actions byte.
>
> This is the HELLO Packet and is only transmitted once.  This bit should
> be added to the options below.
>
>> Key (required):
>>     The key for the given request.  Keys are not padded by a null 
>> character.
> There is a certian danger in allowing the user to specify keys that
> cannot be retreived by the normal (textual) protocol.  I'm really not
> sure if we should say "you get what you deserve, then", or dissallow
> it.  (For that matter, I can't quite recall if there really is such a
> beast.)
>
> Well, right now spaces are fatal in keys.  This removes that
> restriction.  Being able to treat keys as blobs of data is handy.
>

Agreed, I would prefer to be able to use arbitrary values as keys. Rather than having to perform hasing on them first to ensure I do not have an illegal character.

>> The ERROR Packet:
>> The ERROR Packet is one of the ways a server responds to client 
>> requests.  Not all ERROR Packets are fatal errors and indeed, the 
>> server responds with an ERROR Packet after a STORE Packet has been 
>> processed by the server.
> I'm not sure this is a good idea.  Shouldn't we imply good by the lack
> of an error packet, if we wish to be efficent?
>
> Some messages respond with a RESPONSE Packet (what I'm thinking about
> renaming to the DATA Packet), but all commands give some kind of
> feedback.  A lack of a response is not acceptable.  As I said at the
> bottom, I'm tempted to rename this packet to the RESPONSE Packet, but
> the point remains the same: some kind of acknowledgment packet always
> needs to be sent back.  The client relies on a write(2) then a read(2)
> for any memcache function to succeed and I see no reason to change
> that.

How about a DATA packet, and a STATUS packet? I agree with James that calling a good response an ERROR packet is a little odd.  But the names don't really matter, I can live with ERROR and RESPONSE easily enough.

> It may be interesting to goto an asynchronous model (from a purely
> academic approach), but I can't see any benefits of such an approach. 
> PostgreSQL does that for its pq(4) protocol and in libpq(3), and I find
> it to be only useful for consuming userland CPU cycles.  If you need
> asynchronous behavior, use pthreads and wrap the blocking nature of
> memcache in a condition variable.  Fire and forget would only work for
> setting data, but since most memcache installations are used for
> read's, I can't see a benefit here.
>
>> Additional Notes:
>> If a client connects and sends an invalid request that is out of
>> bounds  for the protocol, the server with a plain text error message
>> and closes  the connection.  The format for the plain text error
>> response is:
>> ERROR [code]: [message]\n
>> [custom message]\n
>> <server closes connection>
> I hope this just got in this spec by accident -- haven't we already
> covered this with the error packet?
>
> It is possible for buggy clients or servers to get out of sync with
> what the server thinks should happen.  If that happens, the client
> takes the last bit of data read from the server, searches back until it
> finds the 2nd to last newline and it is able to come up with an error
> message even if things get out of sync.  -sc

--
Sean Chittenden

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20041208/4393df63/attachment-0001.html