UDP and byte ranges

Aaron Stone aaron at serendipity.cx
Tue Dec 18 06:43:57 UTC 2007


Marc from facebook emailed a few comments to me about the binary
protocol documentation that I've been working on, and got me thinking
more about integrating UDP support into the core protocol (much to
Marc's chagrin, I think).

Here are some open issues with UDP (some from Marc today, and some from
the hackathon at Yahoo! when we were talking about the Message ID):

 - Do we need to separate the Request ID from the Message ID?
 - Do we need to be able to request portions of a value starting
   from some offset? (to handle the now-infamous facebook's-udp-mgets
   are-so-fat, your-mamma's-ethernet-take-it-no-more!)
 - Do we need the client to tell the server how big a receive window
   it has?
 - Do we need the server to tell the client how much data is about to
   show up?

Some thoughts:

I _don't_ see a reason to have separate request id's from message id's.
The combination of a message id and packet number (or byte range, which
I'll get to in a moment) tell us everything we need to know.

If we want the ability to request the n-th byte through the end, why not
just ask for the n-th through m-th byte?

(yes, this is the byte range feature that we've all acknowledged is a
bad idea. except that it completely subsumes the functionality of the
UDP packet sequence number and does it even more powerfully.)

If the client can discover the size of the data, then it can control the
receive window by asking for byte ranges.

Proposal:

Add a field to the GET response akin to DNS's "there's more data but you
need to ask for it". The first response packet will tell the client how
long the entire key is in an extras field, and the common header will
tell the client how long the data it got in the initial response is.

Add a new command, RGET (range-get), that defines a larger extras
section with two additional fields, the offset and the length.

The client is explicitly allowed to ask for more data than can fit in a
single UDP packet.

The server sends as many RGET response packets as it needs to send, with
each one containing enough information (offset and length) to reassemble
the value on the client _without resequenceing the packets_!

If the client needs to rate limit the response, it can send separate
RGET requests with each one asking for some length of data that the
client can handle at that moment.

Rationale:

By eliminating the packet sequence number, we save the client from
having to hold all the pieces in order until it can return the value to
the client application.

By giving offsets in each packet, we avoid the potential problem of
losing the first packet and then being flooded with subsequent packets
that we don't know what to do with.

We also give the ability to set up a receive buffer in some size as
indicated in the first packet and then blindly stuff the subsequent
packet values into the right locations in this buffer. If one chunk is
missing, it can be specifically re-requested, too.

Yes, if the data-check mismatches during this operation, you've got to
start asking all over again, but I think that's a fundamental problem to
asking for a large value over UDP that might require re-requests (as
opposed to TCP handling the re-transmit for you).

I believe this all to be completely stateless on the server.

Thoughts? Comments?

I'm going to draw up some protocol pictures this week to show what this
might look like.

Aaron



More information about the memcached mailing list