mecached - text protocol?

Brad Fitzpatrick brad@danga.com
Thu, 15 Jul 2004 18:02:23 -0700 (PDT)


> If the goal is to save space and time, the following...
>
>       <key>:  1 byte: length of following key, without NULL byte
>                    (1-250, inclusive)
>               the key
>               \0 byte
>
> ...is one byte too many.  If you use the ending null byte to determine
> when the key ends, you can skip using the length.  If you use the
> length, you don't need the null byte.  Plus, it is ambiguous as to if
> you can use null bytes in your keys (if you want to use spaces, then is
> there a reason to exclude null bytes from being allowed too?)

The length of the key is so we can validate the null was provided, and
before the end of the overall command packet.  We'll still have to verify
there's no null byte before the one they said.  Saving CPU on the server
isn't a big goal.  (See below)

The null byte was so we could use &c->readbuf[key_offset] directly as
char* key which is passed to assoc_item() and such, without having to copy
it into a new buffer.

Null bytes were going to be excluded.  Too problematic for lots of
languages.  Not part of UTF-8 or ASCII anyway.

> If the null byte is there to make passing to the hashing function easier
> (and avoid a memcpy), then perhaps the hashing function could be
> modified to take a length.

Perhaps.  This was less invasive.

> Short of that, this input would still need to be validated to make sure
> that there was a null byte at the proper location in order to avoid a
> buffer overrun or other memory access exploit through the hashing
> function.  Should the clients be trusted in this case?

I never trust the client, even inside the network.  Accidents happen.

> Now, if any of these aspects of text protocols may not apply in this
> specific case, that is up to the implementors to decide.  Security may
> be less of a concern -- does anyone run a public memcached server? :) --
> but it may make it harder to ensure correctness.

I'm aware the peril is more code paths, so I was going to make new
functions so I'm not duplicating the guts.  There would be too
parsing/validation front-ends, setting a flag on the connection object
about what type of response to send back, and then one shared logic to
do the work and duplex there to send the responses (which are almost
identical, short of a binary length prefix on the response).

My goals are:

 -- keep text protocol
 -- add binary protocol
 -- kill server-supported multi-get support for binary
    (the client can just pipeline gets)
 -- never trust clients to send well-formed requests
 -- allow spaces in keys (binary protocol only)
 -- binary responses:  less CPU for clients to parse

Never in my goal is less CPU for the server.  I agree with you and Avva
that strcmps aren't that slow.

Again, it's not for certain this will go in.  It's a misc side project.
It will have to be demonstratably better and reviewed before I'll commit.

- Brad