the write strategy of the client implement affects the set
mdounin at mdounin.ru
Wed Mar 5 00:50:41 UTC 2008
On Wed, Mar 05, 2008 at 12:40:25AM +0800, Yin Chen wrote:
> I am involved in the optimization of the performance of a memcached
>client library now(written by c).
> And I found that if I want to set a big value(say: 9000 bytes long),
>prepare all the data and call write once or split the data to two writes
>will cause dramatic performance difference: the latter is about 100 times
>slow than the former.
> I do the experiment on my own machine, a single core T43 notebook. Run
>one memcached server instance with the command "/usr/bin/memcached -m 64 -p
>11211 -u root", and write code to connect to the 11211 port to write the set
>operation data. Iterate 100 times to set a 9000 bytes value each time. The
>client.c code prepares all the data and write the data once. The
>client_multi_write.c code splite the data to two writes. Run the two
>programs to get the above result.
> I think it's partly related to the implement of the memcached? Anybody to
>confirm the above experiment or give me an explanation?
Here is tcpdump I got under FreeBSD 6.2:
02:59:16.390023 IP localhost.54085 > localhost.11211: P 72193:80389(8196) ack 65 win 35840 <nop,nop,timestamp 45948689 45948689>
02:59:16.490168 IP localhost.11211 > localhost.54085: . ack 80389 win 35840 <nop,nop,timestamp 45948699 45948689>
02:59:16.490172 IP localhost.54085 > localhost.11211: P 80389:81217(828) ack 65 win 35840 <nop,nop,timestamp 45948699 45948699>
02:59:16.490184 IP localhost.11211 > localhost.54085: P 65:73(8) ack 81217 win 35840 <nop,nop,timestamp 45948699 45948699>
You may note that first packet was ack'ed after 100ms -
default delayed ack timeout in FreeBSD. Since your next packet is
small - OS waits before sending it, and sends only after previous
packet was ack'ed. And since memcached don't get the whole
request, it has nothing to answer - so ack was sent only after timeout.
Setting net.inet.tcp.delayed_ack to 0 makes times much more real:
$ time ./client > /dev/null
$ time ./client_multi_write > /dev/null
I.e. both code variants are almost identical.
To fix this correctly without switching off delayed ack you should
use TCP_NODELAY in your client.
You may also consider using writev(2) instead of multiple write()
calls, or setting TCP_NOPUSH (under FreeBSD) / TCP_CORK (under
Linux) and switching them off after all data was written. This is
not directly related to the problem above, but will help reducing
number of packets sent over wire - it's important since TCP_NODELAY
will switch off normal OS's packet aggregation mechanisms.
More information about the memcached