<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">
<TITLE>Re: Binary Protocol...</TITLE>
</HEAD>
<BODY>
<DIV id=idOWAReplyText68698 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>I'm at home, using outlook
web access so it refuses to properly quote, but see my inline comments below
anyway...</FONT></DIV></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> memcached-bounces@lists.danga.com on
behalf of Sean Chittenden<BR><B>Sent:</B> Wed 12/8/2004 6:55 PM<BR><B>To:</B>
James Mastros<BR><B>Cc:</B> memcached@lists.danga.com<BR><B>Subject:</B> Re:
Binary Protocol...<BR></FONT><BR></DIV>
<DIV>
<P><FONT size=2>>> In the interests of feature growth and moving away from
the <BR>>> convenient, but rather expensive text protocol, I'd like
to propose<BR>>> the binary memcache protocol.<BR>><BR>> I'm
not clear that the protocol is that expensive, or that it matters<BR>>
terribly much.<BR>><BR>> Right now the protocol has no structure other
than it's newline<BR>> delimited. Fantastic for telnet sessions, but
it's hard to extend. In<BR>> the current protocol, the solution is to
add new commands or add<BR>> additional flags at the end of the command (ie:
'set foo 0 1 1 2 5 6 1<BR>> 2 4 5', etc). HTTP at least has some
structure, memcached at the<BR>> moment does not. Moving things to a
binary protocol gives structure<BR>> and the ability to have arbitrary keys
and values. With the binary<BR>> protocol, you could have newline
characters in your keys and it<BR>> wouldn't matter. That peace of mind
is huge, IMHO.<BR>><BR>> Are your servers or users
CPU-bound?</FONT></P><FONT size=2></FONT></DIV>
<DIV><FONT size=2>My servers are CPU bound, we have a very complex database
where we log all kinds of statistical data. There is a cluster of servers
dedicated to perform aggregation and statistical analysis. We are using
memcache to eliminate the database as the bottleneck, and after this is done the
actual CPU usage of the server (quad proc xeon boxes atm) is the limiting
factor. As such whatever ways we can lower CPU usage are important to
me. </FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>Libmemcache is a small % vs the actual computation, but
anything helps, and when its a simple obviously good step like moving to an
easier to extend and use binary protocol I see it as a no brainer.</DIV>
<P>><BR>> No, but when profiling, most of the time doing memcache related
stuff<BR>> is spent parsing responses. With a binary protocol, that
will be<BR>> reduced to the lowest possible level.<BR>><BR>> Is all
that much CPU used in the parsing of the protocol?<BR>><BR>> Well, in my
benchmarking routines, 60% of the time of the library is<BR>> spent doing
string handling... and libmemcache(3) is pretty quick about<BR>> its
parsing. That said, do I think someone is CPU bound who's using<BR>>
libmemcache(3)? Absolutely not. But a text protocol is
fundamentally<BR>> limited by the characteristics of the agreed upon text
protocol (can't<BR>> use colons, newlines, etc...). A binary protocol
only leaves us with<BR>> size limitations, which we had earlier
anyway.<BR>><BR>> Are they network bound, and if so, is the protocol
overhead really<BR>> that much more then the data you're slinging
about? Remember that all<BR>> the techniques for forcing things into
one packet -- disabling<BR>> Nangle's Algo, all that jazz -- are available
with textual protocols<BR>> too.<BR>><BR>> I don't think it's much, but
I don't want to see it grow. My point was<BR>> I'm staying within the
single packet per trip paradigm that<BR>> memcached(8) currently
enjoys. Some binary protocols are chatty and I<BR>> was making a
statement that I'm explicitly avoiding that.<BR>><BR>> Text-based
protocols are easier to debug, and they're easier to extend<BR>> by multiple
people without them stepping on each-other's toes.<BR>> <BR>> Heh, easier
to debug: not to extend, IMHO.<BR>></P>
<P>I don't even consider them easier to debug if I'm working in c. For
high level languages sure, but they are also clearly not geared towards
performance. The entire point of memcached is performance, a lot of users
don't need it if they are just using memcached to cache some data for a web
server, but thats not all memcached is good for. As for easier to extend,
I think its a toss up either way.<BR><BR>>> The HELLO Packet:<BR>> I'd
rather refer to these as "message", and make explicit that you can<BR>> have
more then one of them in a TCP/IP packet.<BR>><BR>> This packet only gets
sent when a connection is established. The HELLO<BR>> Packet
authenticates the connection, but never gets sent after the<BR>> connection
is established.<BR>><BR>>>
0
1
2
3<BR>>> 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
0 1<BR>>>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>>>
| Version |
Options | User Length | Passwd Length
|<BR>>>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>>>
|
Key Space
ID
|<BR>>>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>>>
/
Username
/<BR>>>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>>>
/
Password
/<BR>>>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<BR>><BR>>
I'd prefer to see each length field go immediately before the thing<BR>> that
it's counting, and all length fileds be the same size. (We<BR>>
probably don't need an explicit statement of endianness, but it<BR>> couldn't
hurt.)<BR>><BR>> I'm trying to keep the headers 32bit aligned where
possible that way I<BR>> can do tricks like reading this into a
structure.<BR>><BR>> This may seem bikeshedish, but it allows for reuse of
routines to pack<BR>> and unpack them into the languge's native
strings. In perl, it even<BR>> allows using a single pack/unpack
function call.<BR>><BR>> At some point, an XS wrapper around
libmemcache(3) will probably spring<BR>> into existence. Convenience
for high level languages isn't my concern.<BR>></P>
<P>Given that memcached is designed to be a very fast cache to improve
performance, I think that making it high performance in a language like c should
be the highest priority. It's still going to be very trivial to write
wrappers and use memcached from high level languages. I don't see any
issue here.</P>
<P><BR>>> Options (required):<BR>>> These
bits refer to the bits in the Options Byte.<BR>>>
Bit 0: Connection provides authentication
information<BR>>> Bit 1: This
client connection requires TLS<BR>>> Bit
2: Disconnect if TLS can not be
negotiated<BR>>> Bit 3-7: Not
designated<BR>> What's the difference between 1 and 2?<BR>><BR>> It's
the difference between "I want TLS if you offer that service" and<BR>> "I
won't talk to you if I can't connect over TLS."<BR>><BR>> Why have 0
different from just a 0-length username and passwd?<BR>><BR>> In closed
networks, there's no need to pass authentication information<BR>> around,
like what memcached(8) does now. The username and password are<BR>>
optional. Note the '/' to the sides of the Username/Password
fields.<BR>><BR>> What are you doing running memcached across a sniffable
network,<BR>> anyway? Doesn't using TLS add in overhead more then
enough to nullify<BR>> any help that a binary protocal would
help?<BR>><BR>> Absolutely! Don't think for a second that I'll be
caught dead using<BR>> TLS in production, but for those who can dream up a
need, at least the<BR>> protocol has support for it. One example
application would be network<BR>> appliances authenticating over
wireless. memcached(8) + TLS +<BR>> pgmemcache(1) to invalidate the
auth bits == way more cool than radius<BR>> or ldap. As I said earlier,
just because the protocol has support for<BR>> it doesn't mean there will be
a feature to back it up.<BR>><BR>>> If Bit
2 of the Options 1 Byte is set, this value specifies
the<BR>>> expiration of a key in seconds from the
Epoch.<BR>> Either "from the UNIX Epoch" or "in seconds since Dec 31, 1969
at<BR>> 23:59:59 GMT", please.<BR>><BR>> This lets us specify a
relative vs absolute time using relative<BR>> expiration times greater than a
month. Not sure what your concern here<BR>> is...<BR>></P>
<P>Agreed, specifying absolute or relative should be supported, both are
useful. I don't see any issue with having both.</P>
<P><BR>>> Options 1 (required):<BR>> Auxilury
actions:<BR>>> These bits refer to the bits in the
Options 1 Byte.<BR>>> Bit 0: If
this key exists and has a relative expiration,
reset<BR>>>
the expiration to be be relative to the current
time.<BR>>> Bit 1: Request that
the server delete the key after sending
the<BR>>>
value to the client.<BR>>> Bit
2: After the server has processed this request, close
the<BR>>>
connection.<BR>>> Bit 3: If the
key exists, include the expiration of the key
in<BR>>>
the response from the server.<BR>>> Bit
4: If the key exists, include the number of fetch
requests<BR>>>
left for this key.<BR>>> Bit
5-7: Not designated<BR>> Bit 5: do not return the data,
only do the other actions in the<BR>> auxilury actions byte.<BR>><BR>>
This is the HELLO Packet and is only transmitted once. This bit
should<BR>> be added to the options below.<BR>><BR>>> Key
(required):<BR>>> The key for the given
request. Keys are not padded by a null <BR>>>
character.<BR>> There is a certian danger in allowing the user to specify
keys that<BR>> cannot be retreived by the normal (textual) protocol.
I'm really not<BR>> sure if we should say "you get what you deserve, then",
or dissallow<BR>> it. (For that matter, I can't quite recall if there
really is such a<BR>> beast.)<BR>><BR>> Well, right now spaces are
fatal in keys. This removes that<BR>> restriction. Being able to
treat keys as blobs of data is handy.<BR>></P>
<P>Agreed, I would prefer to be able to use arbitrary values as keys. Rather
than having to perform hasing on them first to ensure I do not have an illegal
character.</P>
<P><BR>>> The ERROR Packet:<BR>>> The ERROR Packet is one of the
ways a server responds to client <BR>>> requests. Not all ERROR
Packets are fatal errors and indeed, the <BR>>> server responds with
an ERROR Packet after a STORE Packet has been <BR>>> processed by the
server.<BR>> I'm not sure this is a good idea. Shouldn't we imply good
by the lack<BR>> of an error packet, if we wish to be
efficent?<BR>><BR>> Some messages respond with a RESPONSE Packet (what I'm
thinking about<BR>> renaming to the DATA Packet), but all commands give some
kind of<BR>> feedback. A lack of a response is not acceptable. As
I said at the<BR>> bottom, I'm tempted to rename this packet to the RESPONSE
Packet, but<BR>> the point remains the same: some kind of acknowledgment
packet always<BR>> needs to be sent back. The client relies on a
write(2) then a read(2)<BR>> for any memcache function to succeed and I see
no reason to change<BR>> that.<BR></P>
<P>How about a DATA packet, and a STATUS packet? I agree with James that calling
a good response an ERROR packet is a little odd. But the names don't
really matter, I can live with ERROR and RESPONSE easily enough.</P>
<P>> It may be interesting to goto an asynchronous model (from a
purely<BR>> academic approach), but I can't see any benefits of such an
approach. <BR>> PostgreSQL does that for its pq(4) protocol and in
libpq(3), and I find<BR>> it to be only useful for consuming userland CPU
cycles. If you need<BR>> asynchronous behavior, use pthreads and wrap
the blocking nature of<BR>> memcache in a condition variable. Fire and
forget would only work for<BR>> setting data, but since most memcache
installations are used for<BR>> read's, I can't see a benefit
here.<BR>><BR>>> Additional Notes:<BR>>> If a client connects and
sends an invalid request that is out of<BR>>> bounds for the
protocol, the server with a plain text error message<BR>>> and
closes the connection. The format for the plain text
error<BR>>> response is:<BR>>> ERROR [code]: [message]\n<BR>>>
[custom message]\n<BR>>> <server closes connection><BR>> I hope
this just got in this spec by accident -- haven't we already<BR>> covered
this with the error packet?<BR>><BR>> It is possible for buggy clients or
servers to get out of sync with<BR>> what the server thinks should
happen. If that happens, the client<BR>> takes the last bit of data
read from the server, searches back until it<BR>> finds the 2nd to last
newline and it is able to come up with an error<BR>> message even if things
get out of sync. -sc<BR><BR>--<BR>Sean
Chittenden<BR><BR></P></FONT>
</BODY>
</HTML>