memory fragmentation issues in 1.2?

Thu Dec 7 21:07:15 UTC 2006

See - now this could turn into a very reasonable
conversation - I'm aware of all the moments that
you're pointing to - in univca's brutal http
implementation - and (almost) all those descisions
were made with eyes open, understading the
implications (and almost always using memcached as a
yardstick)

Some shortcuts are questionable, some are temporary e
t.c. Not saving on encoding, betting on GET not being
split e t.c. 

memcached also made some design descisions and
shortcuts e t.c. 

It would be *great* to have this kind of conversation
some time -- pointing to particular places in the code
and rationale behind some design descions / with the
numbers and facts, instead of turning it into
flamewars of any kind. 

memcached and univca should be seen as freebsd and
linux of open source caching and I would be happy to
have any kind of open technical conversations on any
mailing list that you would prefer. 

I'm not sure that memcached mailing list is the right
place, though. Is it?

Let me know? ... only some time after xmas ... 

Rgds.Paul.

... I am rolling out univca 1.4 which will be *not*
like memcached at all ... there is now no 'puts' from
scripting layer (the cache is fetching the data
directly from mysql) so the overhead of non-keepalived
POST becomes non important anymore e t.c.

--- Steven Grimm <sgrimm at facebook.com> wrote:

> Warning to readers: This is more about univca than
> memcached, but it 
> does discuss why memcached needs its internal state
> machine, so 
> hopefully it's not too off-topic for the list.
> 
> 
> Paul T wrote:
> > If you want to see how to implement keepalive HTTP
> > protocol without state machine you can look at
> univca
> > source code - you were able to grasp memcached
> code -
> > reading univca code would be clearly no problem.
> >   
> 
> Okay, I just read it, and the way univca handles
> partial incoming 
> requests without a state machine is... it doesn't do
> either. As in, it 
> doesn't handle them, and it *does* have a state
> machine.
> 
> Univca assumes that the entire request will always
> be present as soon as 
> the socket is readable. Try this simple test on a
> local univca instance:
> 
> $ (echo -n "GE"; sleep 1; echo "T /get?foobar
> HTTP/1.0") | nc localhost 1966
> 
> That is a single HTTP request. Univca responds to it
> with *two* HTTP 
> responses. What's more, even though the first
> "request" (a bare "GE" 
> with no line terminator) is clearly not a valid HTTP
> request, nor the 
> second one for that matter, it responds with two
> "OK" responses, not 
> even an indication of a syntax error.
> 
> Now try the same thing against memcached:
> 
> $ (echo -n "ge"; sleep 1; echo "t foobar") | nc
> localhost 11211
> 
> You get one response back, as you should. Why is
> this a big deal? Two 
> reasons. The lesser one: as soon as you send back
> more response headers 
> than a keepalive-enabled client expects, the client
> may read the first 
> header and assume it's done. Then when it sends its
> next request, it 
> will read the bogus header, still sitting in the
> socket buffer, rather 
> than the actual response to its new request. On
> every request after that 
> it will read the response to the previous request,
> and since there's no 
> error indication, it will assume it's getting a
> bunch of cache misses. 
> That's if you're lucky. If you're not, then you hit
> the second problem:
> 
> $ (echo -n "GET /get?userInfo/joe"; sleep 1; echo
> "smith HTTP/1.0") | nc 
> localhost 1966
> 
> If your cache happens to have a "userInfo/joe" item,
> univca will happily 
> send it back to the client -- and since the HTTP
> response doesn't 
> indicate the key of the item that's being returned,
> the client has no 
> way of knowing that it is getting an object other
> than the one it asked 
> for, in this case the user info for user "joe"
> rather than for 
> "joesmith". In an environment where you're streaming
> several requests 
> down a keepalive connection, it is totally
> reasonable and expected for a 
> request's initial line to be split across packet
> boundaries.
> 
> Finally, univca *does* have a state machine. Each
> connection's state can 
> switch from ReadingHandler (the "waiting for new
> request" state) to 
> LongPostHandler (the "waiting for data for an
> existing request" state) 
> or WritingHandler (the "have to send data now"
> state) which has its 
> *own* internal state machine ("header not sent yet"
> / "header sent" 
> states -- which, BTW, also ignore the possibility of
> a 
> partially-completed write -- and "bytes remaining to
> write" / "all done 
> with response" states). Yeah, sure, it's an
> *implicit* state machine, 
> but it's a state machine all the same; you can
> clearly see where it 
> transitions from one state to another, and the
> transitions are 
> well-defined and deterministic.
> 
> > If this is *not* semi-binary protocol then I don't
> > know  how else to call it.
> >   
> 
> There are two things there that you might refer to
> as semi-binary: the 
> need to terminate lines with \r\n and the fact that
> client data is 
> stored verbatim, whether it's text or not. I'll
> tackle those in order.
> 
> Line termination: If that's what makes memcached's
> protocol semi-binary, 
> then univca's protocol is semi-binary too, as are
> SMTP and most of the 
> so-called "text-based" protocols on the Internet.
> RFC2616, the HTTP/1.1 
> spec, has this to say (section 5.1):
> 
> "The Request-Line begins with a method token,
> followed by the 
> Request-URI and the protocol version, and ending
> with CRLF. The elements 
> are separated by SP characters. No CR or LF is
> allowed except in the 
> final CRLF sequence."
> 
> Client data: Sometimes clients want to cache binary
> data. Say, a 
> generated CAPTCHA image or a serialized Java object.
> There are two 
> alternatives: send the data "as is" or text-encode
> it somehow. It 
> appears that univca takes the latter approach. That
> means that every 
> single "set" request to univca where there's a
> possibility of 
> non-textual content -- for some applications, this
> will be nearly all 
> the content in the cache -- requires the client to
> scan its values and 
> encode them. What does spending those extra CPU
> cycles and waiting for 
> those extra bytes to go over the network gain you?
> Not much that I can 
> see. A cached serialized object encoded as text is
> every bit as 
> unreadable to a human as the same object in native
> form.
> 
> Furthermore, although univca's requests are required
> to be textual, its 
> responses aren't; it sends binary data back to the
> client in binary form 
> in response to a "get" request. So once again, if
> transmitting binary 
> client-generated data in binary form makes a
> protocol semi-binary, then 
> univca's protocol is semi-binary too.
> 
> Anyway, to everyone's relief I'm sure, that's all I
> have to say on 
> univca vs. memcached for now. Suffice to say both
> pieces of software 
> have ample room for improvement, and for some
> applications one may be a 
> better fit than the other.
> 
> And to all a good night.
> 
> -Steve
> 
> 

____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com