<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>


<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">

<TITLE>Re: Binary Protocol...</TITLE>

</HEAD>

<BODY>

<DIV id=idOWAReplyText68698 dir=ltr>

<DIV dir=ltr><FONT face=Arial color=#000000 size=2>I'm at home, using outlook 

web access so it refuses to properly quote, but see my inline comments below 

anyway...</FONT></DIV></DIV>

<DIV dir=ltr><BR>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> memcached-bounces@lists.danga.com on 

behalf of Sean Chittenden<BR><B>Sent:</B> Wed 12/8/2004 6:55 PM<BR><B>To:</B> 

James Mastros<BR><B>Cc:</B> memcached@lists.danga.com<BR><B>Subject:</B> Re: 

Binary Protocol...<BR></FONT><BR></DIV>

<DIV>

<P><FONT size=2>&gt;&gt; In the interests of feature growth and moving away from 

the&nbsp;<BR>&gt;&gt; convenient, but rather expensive text protocol, I'd like 

to propose<BR>&gt;&gt; the&nbsp; binary memcache protocol.<BR>&gt;<BR>&gt; I'm 

not clear that the protocol is that expensive, or that it matters<BR>&gt; 

terribly much.<BR>&gt;<BR>&gt; Right now the protocol has no structure other 

than it's newline<BR>&gt; delimited.&nbsp; Fantastic for telnet sessions, but 

it's hard to extend.&nbsp; In<BR>&gt; the current protocol, the solution is to 

add new commands or add<BR>&gt; additional flags at the end of the command (ie: 

'set foo 0 1 1 2 5 6 1<BR>&gt; 2 4 5', etc).&nbsp; HTTP at least has some 

structure, memcached at the<BR>&gt; moment does not.&nbsp; Moving things to a 

binary protocol gives structure<BR>&gt; and the ability to have arbitrary keys 

and values.&nbsp; With the binary<BR>&gt; protocol, you could have newline 

characters in your keys and it<BR>&gt; wouldn't matter.&nbsp; That peace of mind 

is huge, IMHO.<BR>&gt;<BR>&gt;&nbsp;Are your servers or users 

CPU-bound?</FONT></P><FONT size=2></FONT></DIV>

<DIV><FONT size=2>My servers are CPU bound, we have a very complex database 

where we log all kinds of statistical data.&nbsp; There is a cluster of servers 

dedicated to perform aggregation and statistical analysis.&nbsp; We are using 

memcache to eliminate the database as the bottleneck, and after this is done the 

actual CPU usage of the server (quad proc xeon boxes atm) is the limiting 

factor.&nbsp; As such whatever ways we can lower CPU usage are important to 

me.&nbsp; </FONT></DIV>

<DIV><FONT size=2></FONT>&nbsp;</DIV>

<DIV><FONT size=2>Libmemcache is a small % vs the actual computation, but 

anything helps, and when its a simple obviously good step like moving to an 

easier to extend and use binary protocol I see it as a no brainer.</DIV>

<P>&gt;<BR>&gt; No, but when profiling, most of the time doing memcache related 

stuff<BR>&gt; is spent parsing responses.&nbsp; With a binary protocol, that 

will be<BR>&gt; reduced to the lowest possible level.<BR>&gt;<BR>&gt; Is all 

that much CPU used in the parsing of the protocol?<BR>&gt;<BR>&gt; Well, in my 

benchmarking routines, 60% of the time of the library is<BR>&gt; spent doing 

string handling... and libmemcache(3) is pretty quick about<BR>&gt; its 

parsing.&nbsp; That said, do I think someone is CPU bound who's using<BR>&gt; 

libmemcache(3)?&nbsp; Absolutely not.&nbsp; But a text protocol is 

fundamentally<BR>&gt; limited by the characteristics of the agreed upon text 

protocol (can't<BR>&gt; use colons, newlines, etc...).&nbsp; A binary protocol 

only leaves us with<BR>&gt; size limitations, which we had earlier 

anyway.<BR>&gt;<BR>&gt; Are they network bound, and if so, is the protocol 

overhead really<BR>&gt; that much more then the data you're slinging 

about?&nbsp; Remember that all<BR>&gt; the techniques for forcing things into 

one packet -- disabling<BR>&gt; Nangle's Algo, all that jazz -- are available 

with textual protocols<BR>&gt; too.<BR>&gt;<BR>&gt; I don't think it's much, but 

I don't want to see it grow.&nbsp; My point was<BR>&gt; I'm staying within the 

single packet per trip paradigm that<BR>&gt; memcached(8) currently 

enjoys.&nbsp; Some binary protocols are chatty and I<BR>&gt; was making a 

statement that I'm explicitly avoiding that.<BR>&gt;<BR>&gt; Text-based 

protocols are easier to debug, and they're easier to extend<BR>&gt; by multiple 

people without them stepping on each-other's toes.<BR>&gt; <BR>&gt; Heh, easier 

to debug: not to extend, IMHO.<BR>&gt;</P>

<P>I don't even consider them easier to debug if I'm working in c.&nbsp; For 

high level languages sure, but they are also clearly not geared towards 

performance.&nbsp; The entire point of memcached is performance, a lot of users 

don't need it if they are just using memcached to cache some data for a web 

server, but thats not all memcached is good for.&nbsp; As for easier to extend, 

I think its a toss up either way.<BR><BR>&gt;&gt; The HELLO Packet:<BR>&gt; I'd 

rather refer to these as "message", and make explicit that you can<BR>&gt; have 

more then one of them in a TCP/IP packet.<BR>&gt;<BR>&gt; This packet only gets 

sent when a connection is established.&nbsp; The HELLO<BR>&gt; Packet 

authenticates the connection, but never gets sent after the<BR>&gt; connection 

is established.<BR>&gt;<BR>&gt;&gt;&nbsp; 

0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

3<BR>&gt;&gt;&nbsp; 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 

0 1<BR>&gt;&gt; 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>&gt;&gt; 

|&nbsp;&nbsp;&nbsp; Version&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; 

Options&nbsp;&nbsp;&nbsp; |&nbsp; User Length&nbsp; | Passwd Length 

|<BR>&gt;&gt; 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>&gt;&gt; 

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

Key Space 

ID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

|<BR>&gt;&gt; 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>&gt;&gt; 

/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

Username&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

/<BR>&gt;&gt; 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>&gt;&gt; 

/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

Password&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

/<BR>&gt;&gt; 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>&gt;<BR>&gt; 

I'd prefer to see each length field go immediately before the thing<BR>&gt; that 

it's counting, and all length fileds be the same size.&nbsp; (We<BR>&gt; 

probably don't need an explicit statement of endianness, but it<BR>&gt; couldn't 

hurt.)<BR>&gt;<BR>&gt; I'm trying to keep the headers 32bit aligned where 

possible that way I<BR>&gt; can do tricks like reading this into a 

structure.<BR>&gt;<BR>&gt; This may seem bikeshedish, but it allows for reuse of 

routines to pack<BR>&gt; and unpack them into the languge's native 

strings.&nbsp; In perl, it even<BR>&gt; allows using a single pack/unpack 

function call.<BR>&gt;<BR>&gt; At some point, an XS wrapper around 

libmemcache(3) will probably spring<BR>&gt; into existence.&nbsp; Convenience 

for high level languages isn't my concern.<BR>&gt;</P>

<P>Given that memcached is designed to be a very fast cache to improve 

performance, I think that making it high performance in a language like c should 

be the highest priority.&nbsp; It's still going to be very trivial to write 

wrappers and use memcached from high level languages.&nbsp; I don't see any 

issue here.</P>

<P><BR>&gt;&gt; Options (required):<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; These 

bits refer to the bits in the Options Byte.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; 

Bit 0:&nbsp;&nbsp;&nbsp; Connection provides authentication 

information<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 1:&nbsp;&nbsp;&nbsp; This 

client connection requires TLS<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 

2:&nbsp;&nbsp;&nbsp; Disconnect if TLS can not be 

negotiated<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 3-7:&nbsp;&nbsp;&nbsp; Not 

designated<BR>&gt; What's the difference between 1 and 2?<BR>&gt;<BR>&gt; It's 

the difference between "I want TLS if you offer that service" and<BR>&gt; "I 

won't talk to you if I can't connect over TLS."<BR>&gt;<BR>&gt; Why have 0 

different from just a 0-length username and passwd?<BR>&gt;<BR>&gt; In closed 

networks, there's no need to pass authentication information<BR>&gt; around, 

like what memcached(8) does now.&nbsp; The username and password are<BR>&gt; 

optional.&nbsp; Note the '/' to the sides of the Username/Password 

fields.<BR>&gt;<BR>&gt; What are you doing running memcached across a sniffable 

network,<BR>&gt; anyway?&nbsp; Doesn't using TLS add in overhead more then 

enough to nullify<BR>&gt; any help that a binary protocal would 

help?<BR>&gt;<BR>&gt; Absolutely!&nbsp; Don't think for a second that I'll be 

caught dead using<BR>&gt; TLS in production, but for those who can dream up a 

need, at least the<BR>&gt; protocol has support for it.&nbsp; One example 

application would be network<BR>&gt; appliances authenticating over 

wireless.&nbsp; memcached(8) + TLS +<BR>&gt; pgmemcache(1) to invalidate the 

auth bits == way more cool than radius<BR>&gt; or ldap.&nbsp; As I said earlier, 

just because the protocol has support for<BR>&gt; it doesn't mean there will be 

a feature to back it up.<BR>&gt;<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; If Bit 

2&nbsp;&nbsp;&nbsp; of the Options 1 Byte is set, this value specifies 

the<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; expiration of a key in seconds from the 

Epoch.<BR>&gt; Either "from the UNIX Epoch" or "in seconds since Dec 31, 1969 

at<BR>&gt; 23:59:59 GMT", please.<BR>&gt;<BR>&gt; This lets us specify a 

relative vs absolute time using relative<BR>&gt; expiration times greater than a 

month.&nbsp; Not sure what your concern here<BR>&gt; is...<BR>&gt;</P>

<P>Agreed, specifying absolute or relative should be supported, both are 

useful.&nbsp; I don't see any issue with having both.</P>

<P><BR>&gt;&gt; Options 1 (required):<BR>&gt; Auxilury 

actions:<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; These bits refer to the bits in the 

Options 1 Byte.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 0:&nbsp;&nbsp;&nbsp; If 

this key exists and has a relative expiration, 

reset<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

the expiration to be be relative to the current 

time.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 1:&nbsp;&nbsp;&nbsp; Request that 

the server delete the key after sending 

the<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

value to the client.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 

2:&nbsp;&nbsp;&nbsp; After the server has processed this request, close 

the<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

connection.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 3:&nbsp;&nbsp;&nbsp; If the 

key exists, include the expiration of the key 

in<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

the response from the server.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 

4:&nbsp;&nbsp;&nbsp; If the key exists, include the number of fetch 

requests<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

left for this key.<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; Bit 

5-7:&nbsp;&nbsp;&nbsp; Not designated<BR>&gt; Bit 5: do not return the data, 

only do the other actions in the<BR>&gt; auxilury actions byte.<BR>&gt;<BR>&gt; 

This is the HELLO Packet and is only transmitted once.&nbsp; This bit 

should<BR>&gt; be added to the options below.<BR>&gt;<BR>&gt;&gt; Key 

(required):<BR>&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; The key for the given 

request.&nbsp; Keys are not padded by a null&nbsp;<BR>&gt;&gt; 

character.<BR>&gt; There is a certian danger in allowing the user to specify 

keys that<BR>&gt; cannot be retreived by the normal (textual) protocol.&nbsp; 

I'm really not<BR>&gt; sure if we should say "you get what you deserve, then", 

or dissallow<BR>&gt; it.&nbsp; (For that matter, I can't quite recall if there 

really is such a<BR>&gt; beast.)<BR>&gt;<BR>&gt; Well, right now spaces are 

fatal in keys.&nbsp; This removes that<BR>&gt; restriction.&nbsp; Being able to 

treat keys as blobs of data is handy.<BR>&gt;</P>

<P>Agreed, I would prefer to be able to use arbitrary values as keys. Rather 

than having to perform hasing on them first to ensure I do not have an illegal 

character.</P>

<P><BR>&gt;&gt; The ERROR Packet:<BR>&gt;&gt; The ERROR Packet is one of the 

ways a server responds to client&nbsp;<BR>&gt;&gt; requests.&nbsp; Not all ERROR 

Packets are fatal errors and indeed, the&nbsp;<BR>&gt;&gt; server responds with 

an ERROR Packet after a STORE Packet has been&nbsp;<BR>&gt;&gt; processed by the 

server.<BR>&gt; I'm not sure this is a good idea.&nbsp; Shouldn't we imply good 

by the lack<BR>&gt; of an error packet, if we wish to be 

efficent?<BR>&gt;<BR>&gt; Some messages respond with a RESPONSE Packet (what I'm 

thinking about<BR>&gt; renaming to the DATA Packet), but all commands give some 

kind of<BR>&gt; feedback.&nbsp; A lack of a response is not acceptable.&nbsp; As 

I said at the<BR>&gt; bottom, I'm tempted to rename this packet to the RESPONSE 

Packet, but<BR>&gt; the point remains the same: some kind of acknowledgment 

packet always<BR>&gt; needs to be sent back.&nbsp; The client relies on a 

write(2) then a read(2)<BR>&gt; for any memcache function to succeed and I see 

no reason to change<BR>&gt; that.<BR></P>

<P>How about a DATA packet, and a STATUS packet? I agree with James that calling 

a good response an ERROR packet is a little odd.&nbsp; But the names don't 

really matter, I can live with ERROR and RESPONSE easily enough.</P>

<P>&gt; It may be interesting to goto an asynchronous model (from a 

purely<BR>&gt; academic approach), but I can't see any benefits of such an 

approach.&nbsp;<BR>&gt; PostgreSQL does that for its pq(4) protocol and in 

libpq(3), and I find<BR>&gt; it to be only useful for consuming userland CPU 

cycles.&nbsp; If you need<BR>&gt; asynchronous behavior, use pthreads and wrap 

the blocking nature of<BR>&gt; memcache in a condition variable.&nbsp; Fire and 

forget would only work for<BR>&gt; setting data, but since most memcache 

installations are used for<BR>&gt; read's, I can't see a benefit 

here.<BR>&gt;<BR>&gt;&gt; Additional Notes:<BR>&gt;&gt; If a client connects and 

sends an invalid request that is out of<BR>&gt;&gt; bounds&nbsp; for the 

protocol, the server with a plain text error message<BR>&gt;&gt; and 

closes&nbsp; the connection.&nbsp; The format for the plain text 

error<BR>&gt;&gt; response is:<BR>&gt;&gt; ERROR [code]: [message]\n<BR>&gt;&gt; 

[custom message]\n<BR>&gt;&gt; &lt;server closes connection&gt;<BR>&gt; I hope 

this just got in this spec by accident -- haven't we already<BR>&gt; covered 

this with the error packet?<BR>&gt;<BR>&gt; It is possible for buggy clients or 

servers to get out of sync with<BR>&gt; what the server thinks should 

happen.&nbsp; If that happens, the client<BR>&gt; takes the last bit of data 

read from the server, searches back until it<BR>&gt; finds the 2nd to last 

newline and it is able to come up with an error<BR>&gt; message even if things 

get out of sync.&nbsp; -sc<BR><BR>--<BR>Sean 

Chittenden<BR><BR></P></FONT>


</BODY>

</HTML>