UDP support in the binary protocol

Aaron Stone aaron at serendipity.cx
Mon Dec 17 21:06:21 UTC 2007


On Mon, Dec 17, 2007, Marc <marc at facebook.com> said:

> I wanted to mention a few things I�ve been thinking about to take UDP to the
> next level (support update requests and large values).  The main things
> missing from the protocol now are offset-to-next-request in the UDP header,
> a flow control mechanism, and the ability to get specific segments of a
> large value:
> 
> Offset-to-next-request would be specified in bytes 6-7 of the UDP header to
> indicate the beginning of the next request boundary or 0xffff if the current
> packet contains no boundary.  This will allow clients to recover from
> dropped packet errors and receive subsequent replies.

This is basically the byte-range request we often see.

> For flow control, the client needs to be able to throttle the server�s
> reply to prevent implosion.  The problem now is that the client has no idea
> how much data will be returned from a get command.  It can vary wildly and �
> when sending multigets to multiple servers, the reply can momentarily exceed
> the capacity of upstream switches or client itself.  The challenge it that
> right now UDP logic in memcached is very stateless.  That is a good thing,
> and I�m loathe to introduce protocol changes that would require maintaining
> state on the server side.

The more I see of how UDP is used, I'm actually leaning towards wanting to
see multiple packet support dropped entirely. We can add an error status
that translates to "Sorry, please use TCP to get this big-ass value."

> I think the best way to handle this is to again use reserved bytes 6-7 in
> the UDP header to indicate the clients read-buffer segment size.  (like TCP
> it would have to be in some multiple.  Maybe we can use bytes 4-5 to
> indicate scale, since memcached never takes messages > 1 packet).  The reply
> payload cannot exceed this size.  The memcached implementation can record
> this value along with the UDP message id it already records, and, when
> generating the reply, simply track the value and stop processing once the
> limit is reached.
> 
> Lastly, the ability to issue gets of specific segments of a large value
> would allow UDP clients to recover from packet error of large values more
> efficiently.  Currently if any packet is dropped, the entire value must be
> retransmitted.  Even the offset-to-next-request field will not fix this,
> since, for large values,  most packets are within a single request.  What I
> have in mind is that if I successfully read 0..M of a value and then get a
> timeout or out of order packet, I�d like to issue the next get for M.
> The data-version checking logic that exists now means I�m never in danger of
> getting the wrong data.  I just need an  additional flavor of get to specify
> offset and extent.
> 
> Potentially,  similar logic could be done for sets, but given the
> infrequency of sets w.r.t. gets and that this would again require adding a
> lot of state for UDP protocol processing on the server side, I don't think
> it's worth pursuing.

I had this thought in the hackathon. Semantically, such a packet says, "If
you have key Foo, version 0x12345, please set bytes 48 - 93 to the
contents of this packet. Thanks"!

Sure it'd be kinda neat to send a handful of UDP packets, all completely
out of order, and have the server stuff the values into the correct byte
positions, thus creating the value from pieces. But TCP gives us all of
that for the low low price of a stateful connection.

It's taking the byte-range idea and applying it everywhere. From what I've
seen on the list, the consensus is that we don't want to go there.

Aaron


> On 12/17/07 11:34 AM, "Aaron Stone" <aaron at serendipity.cx> wrote:
> 
>> On Mon, Dec 17, 2007, Dustin Sallings <dustin at spy.net> said:
>> 
>>> 
>>> On Dec 16, 2007, at 19:26, Aaron Stone wrote:
>>> 
>>>> Do we want to add 32 bits to the binary protocol for UDP sequencing?
>>>> Has
>>>> this been discussed before? If so, please point me in the direction of
>>>> such a thread in the mailing list archives!
>>> 
>>> 
>>> No, UDP support seems to be the minimal wrapping around the
>>> underlying protocol to provide sequencing.  Not sure if I can point
>>> you to archives, but the intention should be somewhat clear.
>>> 
>>> The purpose of a UDP based protocol would be to provide a
>>> connectionless form of the TCP based protocol with less client and
>>> server overhead.
>>> 
>>> When you think about it that way, you're just implementing some of
>>> the parts that the transport doesn't give you, so it makes sense to
>>> not combine them in such a way that provides redundancy with your
>>> transport.  If you're optimistic, you have less overhead in general.
>>> 
>> 
>> Well, ok, but the only thing that the UDP header provides that the binary
>> protocol does not now provide directly is sequence numbers for
>> reassembling a large SET / GET.
>> 
>> Here's an idea: we have a different magic byte that indicates that the
>> common header is four bytes longer, and we use that magic byte for UDP
>> traffic?
>> 
>> Aaron
>> 
>> 
> 
> 
> 

-- 





More information about the memcached mailing list