memcached: UDP and byte ranges

Dustin Sallings dustin at spy.net
Tue Dec 18 08:57:14 UTC 2007


On Dec 17, 2007, at 23:15, Aaron Stone wrote:

> - Do we need to separate the Request ID from the Message ID?

	The purpose of the request ID is effectively to recreate the TCP  
sequence number.  This just isn't necessary when your data are  
guaranteed to be deliver in order by TCP.

> - Do we need to be able to request portions of a value starting
>   from some offset? (to handle the now-infamous facebook's-udp-mgets
>   are-so-fat, your-mamma's-ethernet-take-it-no-more!)

	My only concern about this is that you may very well be requesting a  
section from a different value on a subsequent request.

> - Do we need the server to tell the client how much data is about to
>   show up?

	The message header already does that.

> I _don't_ see a reason to have separate request id's from message  
> id's.
> The combination of a message id and packet number (or byte range,  
> which
> I'll get to in a moment) tell us everything we need to know.

	It sounds like facebook (does anyone else even use the UDP based  
protocol?) already sends multiple messages in a single UDP request.   
This same thing happens over TCP.  UDP is just a different transport,  
and needs the additional information to do what other transports do  
automatically.

> If we want the ability to request the n-th byte through the end, why  
> not
> just ask for the n-th through m-th byte?
>
> (yes, this is the byte range feature that we've all acknowledged is a
> bad idea. except that it completely subsumes the functionality of the
> UDP packet sequence number and does it even more powerfully.)

	No, it's not the same.  A UDP get still returns the whole value the  
same way it does in TCP, except you have a bit more control over the  
packetization.  Retrieving a value by asking for a series of parts of  
it can't be done atomically.

> Add a field to the GET response akin to DNS's "there's more data but  
> you
> need to ask for it". The first response packet will tell the client  
> how
> long the entire key is in an extras field, and the common header will
> tell the client how long the data it got in the initial response is.

	It already does that.

> Add a new command, RGET (range-get), that defines a larger extras
> section with two additional fields, the offset and the length.

	If this didn't use the CAS identifier, there'd be no guarantees that  
it'd ever be right.  If it did, you're left with the problem of  
finding out what the CAS identifier is.

> The client is explicitly allowed to ask for more data than can fit  
> in a
> single UDP packet.

	It already does, though.  You just can't send more data than will fit  
in a UDP packet.

> The server sends as many RGET response packets as it needs to send,  
> with
> each one containing enough information (offset and length) to  
> reassemble
> the value on the client _without resequenceing the packets_!

	You can already do that.  Once you receive the first packet, you know  
how many packets there are, what the total size is, and if you can  
assume all of the packets before the last one will be the same size,  
you can just fill in the value as the packets arrive.

> Rationale:
>
> By eliminating the packet sequence number, we save the client from
> having to hold all the pieces in order until it can return the value  
> to
> the client application.

	Hopefully that's unnecessary anyway.

> By giving offsets in each packet, we avoid the potential problem of
> losing the first packet and then being flooded with subsequent packets
> that we don't know what to do with.

	If that happens frequently, you should be using TCP and not trying to  
reinvent it.

	Note that an rget is *not* a retransmit.  If you're not very careful,  
you may get part of something unrelated to what the rest of the  
packets represented.  If you are careful, you still may end up having  
to throw away all the other values.

> Thoughts? Comments?


	I really think it's better to either accept lossiness and general  
sloppiness of a thin, dumb UDP transport or just use TCP and get all  
of the rest of the features handled for you by your OS vendor.

-- 
Dustin Sallings





More information about the memcached mailing list