Read vs. write performance (sloppiness in memcache)
Brad Fitzpatrick
brad@danga.com
Sun, 10 Aug 2003 20:56:17 -0700 (PDT)
Yeah, this patch makes a world of difference. Instead of 20 seconds, it
now takes less than 1.
However, it's even more packet-happy than before.
Dude, we have a more-than-plentiful per-connect buffer on the server side
(DATA_BUFFER_SIZE = 2k, by default). Let's use as much of it as we can
before writing. I'm not advocating growing it to fit more, but if
something else has already grown it (quite likely, since we grow it
whenever sending large values), let's write into it, keep track of where
we're at (and thus how much we have remaining), and as long as our output
can keep fitting in our available per-connection memory, let's put it
there until we're out of space, or until we're done.
Looking at this tcpdump, we're doing:
One packet for "VALUE test:intarray:0 0 1\r\n"
One packet for "0\r\n"
One packet for "END\r\n"
Super lame.
That'd fit in 2k.
On Mon, 11 Aug 2003, Anatoly Vorobey wrote:
> You wrote on Sun, Aug 10, 2003 at 07:54:57PM -0700:
> > It should be:
> >
> > Client: get foo, bar (PSH)
> > Server: I have foo and bar. here they are. we're done. (ACK,PSH)
> > Client: Okay. (ACK)
>
> Nah. The problem is not many packets, but the fact that the TCP/IP
> stack waits a little bit on each of them before sending, hoping we're
> going to send another one... but when the client requests 500 values
> one after the other, as in Brion's test, it never does:
>
> client: give me key1
> server: sending value1
> OS: [let's wait a bit]
> waiting...
> OS: ok, nothing more has come. sending value1
> client: received value1
> client: give me key2
> ....
>
> By setting TCP_NODELAY on the server socket, we eliminate the waits
> and seem to completely fix the problem, at least for this particular
> kind of test, where we don't have many consecutive writes. The wait time
> for receiving 500 values drops from 20sec. to 1 or 0. Brion, can you
> verify this with your setup? The patch below makes the server set
> TCP_NODELAY.
>
> On the other hand, what if the client requests 100 keys in one GET
> request? It appears to me that until now, we would send as few IP
> packets as possible, because we didn't have TCP_NODELAY set and so the
> OS was waiting a few dozens of milliseconds for us to send more, and
> we always send more very fast (we don't let go of a GET request as long
> as the OS accepts our write()'s synchronously). With TCP_NODELAY set,
> we will instead of have 200 packets instead of, say, 20. Perhaps we
> should run this under a sniffer, both with and without TCP_NODELAY, to
> figure out whether this is indeed true, and how this options affects
> performance of large multi-key GET requests. If it affects the
> performance badly, but we still want TCP_NODELAY to boost the
> performance of many consecutive single-key requests, we can think of
> buffering on the server side, before sending out write()'s -- since this
> may drastically affect our memory usage, I'd rather not do it unless we
> really have to.
>
>
>
> main -> wcmtools src/memcached/memcached.c
> --- cvs/wcmtools/memcached/memcached.c Tue Jul 29 22:53:49 2003
> +++ src/memcached/memcached.c Sun Aug 10 20:13:20 2003
> @@ -30,6 +30,7 @@
> #include <string.h>
> #include <unistd.h>
> #include <netinet/in.h>
> +#include <netinet/tcp.h>
> #include <arpa/inet.h>
> #include <errno.h>
> #include <time.h>
> @@ -1052,6 +1053,7 @@
> setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR, &flags, sizeof(flags));
> setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, &flags, sizeof(flags));
> setsockopt(sfd, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling));
> + setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, &flags, sizeof(flags));
>
> addr.sin_family = AF_INET;
> addr.sin_port = htons(port);
>
> --
> avva
>
>