mecached - text protocol?

Michal Suszycki mike@wizard.ae.krakow.pl
Fri, 16 Jul 2004 10:39:16 +0200 (CEST)


Hello all,


> > Let's discuss both the format of the binary protocol and its
> > implementation on this list.  It should be very easy to add.

OK, I'll answer to next post and try to give some example.

> I disagree. There's a reason why all successful network protocols happen
> to be text protocols. FTP, SMTP, NNTP, HTTP, you name it.

Just a note: it's hard to imagine person that tried to write HTTP client
or server in C, and still thinks it is a successful protocol.

>
> This in particular:
>
> > > process_command() function is horrorible, all these "if(strncmp...)" takes
> > > some CPU and these are unnecessary.
>
> is just ridiculous. These CPU cycles are are a negligible fraction of
> what is spent reading/writing to the network, looking things up, or, for
> that matter, calculating hash values! I mean, rewriting memcached's hash
> function in optimised x86 assembly would "speed things up" 3-4 times more
> than what we'd win by moving to a binary protocol and eliminating all the
> horrible strcmp() calls. And the speedup, although 3-4 times larger,
> would also be almost as negligible, because it's also far from being the
> bottleneck.

The fact that there are other things in the memcached that needs more CPU
doesn't mean that this program shouldn't be as fast as it can be in every
detail.


> They are, more importantly, easy to maintain, to debug and to extend.
> Adding a new minor command is just a matter of sticking another if
> clause on the server, and putting some text to the socket on the client.

In binary protocol you don't need to change anything in "parsing
function". In fact there is no parsing.


> In a binary format, you have to carefully allocate the command id, check
> that the existing command struct is adequate for your needs, swear and
> extend it for everyone if it isn't, match the numeric id->symbolic
> name mapping on the client side, encode parameters, decode parameters
> (matter of simple sprintf/sscanf or similar functions in a text
> protocol)

What could be more simple that doing:

u_int8_t command = *(u_int8_t *)network_data;
?

That is almost all 'parsing' in simple binary protocol that should be
suitable for memcached.
Maybe scanf is simple for a programmer using it.
Scanf function in glibc takes 6 lines.
But is uses vfscanf() and vfscanf.c has 2500 lines.
It is not simple implementation and it's not simple for CPU in your
program.
Scanf  has to eat character one by one. The same with strncmp or memchr.
And doing

 el = memchr(c->rbuf, '\n', c->rbytes);

on every read it's not good idea.


--------------

Every request or response could have following format:

<u32 len> <u8 code> <DATA>

len - unsigned int - this value hold length of all packet (including
length)

code - unsigned byte that tells what to do (command)

data - _any_ data that is needed for particular command

Function reading from a network is very simple because if you don't have
4 first bytes than you do nothing. If you have them, then you know how
many bytes you have to wait for. No looking for some "control characters".
You can easily cut packets if you have one complete request and half of
another in your network read buffer.


#define PCKT_HEAD 4

/* after successful network read () */
void handle_data(u32 len, struct conn *c)
{
	u32 pktlen;
	u32 *p = c->rbuf;

	/* wait for more data */
	if(c->rbytes < PCKT_HEAD) return;

	for(pktlen = *(u32 *)p; c->rbytes >= pktlen; pktlen = *(u32 *)p) {
                process_command(c, pktlen - 4, p + 4);
                c->rbytes -= pktlen;
                p += pktlen;
                if(c->rbytes < PCKT_HEAD) break;
	}
	/* there is still data unprocessed */
	if(c->rbytes && p != c->rbuf) memmove(c->rbuf, p, c->rbytes);
        c->rcurr = c->rbuf + c->rbytes;
        return;
}

Almost complete example. Function strips len and calls process_command(),
Is is much more faster than try_read_command() becasue it doesn't do any:
el = memchr(c->rbuf, '\n', c->rbytes);

If you have 1k req/sec looking for '\n' in each request isn't good idea.
Now non-strncmp process_command():


static void (*commands[256])(struct conn *, u32 , u8 *);

void process_command(struct conn *c, u32 len, u8 *s)
{
	if(!len || !commands[*s]) return;
        commands[*s](c, len-1, s+1);
}


That's the whole 'parsing' and all you have to do is to add function
pointer
to "commands" table, because <u8 code> is just a index in the table of
function pointers. process_commands() strips <u8 code> so higher level
functions have only len (this is important) and pointer (no data copying,
only pointers).
No looking for special chars, no limitation on bytes in DATA (except those
related to network read buffer size).
Inside DATA you can use some format but it depends what "command"
it is. Command example:


/* all we get in arg is a key */
void get_item(struct conn *c, u32 len, u8 *s)
{
	if(!len) {
		dbg("No key....");
		return;
	}

	it = assoc_find(*s);

	...all staff from memcached.c but without any

	sscanf(start, " %250s%n", key, &next) >= 1

	etc...
}


void process_single_stat(struct conn *c, u32 len, u8 *s)
{
	getrusage(RUSAGE_SELF, &usage);

	/*
         * sending response in binary format. Client takes
	 * care of showing it in human readable format, not server
	 */
	msg_send(c, SERV_STAT, &usage, sizeof usage);

}

commands[0] = get_item;
commands[1] = process_single_stat;


msg_send() it's simple function that composes response in <u32 len><u8
code> format (no copying coz it uses writev()) and sends data to the
client.

Maybe it isn't good idea to write all that code in this post but it's
just an (BTW working) example. I think this is simple and fast enough coz
you just read u32 or u8 values, pass pointers and never look
inside bytes in DATA part in 'low level' protocol functions
(like try_read_command()). You use encapsulation and each function
(handle_data() -> process_commands()) strips data (packet len, code) that
is used by them so higher function doesn't need to know abut protocol
details.


Sincerely,

Michal