the write strategy of the client implement affects the set performance dramatically?

Maxim Dounin mdounin at
Wed Mar 5 00:50:41 UTC 2008


On Wed, Mar 05, 2008 at 12:40:25AM +0800, Yin Chen wrote:

>    I am involved in the optimization of the performance of a  memcached
>client library now(written by c).
>    And I found that if I want to set a big value(say: 9000 bytes long),
>prepare all the data and call write once or split the data to two writes
>will cause dramatic performance difference: the latter is about 100 times
>slow than the former.
>    I do the experiment on my own machine, a single core T43 notebook. Run
>one memcached server instance with the command "/usr/bin/memcached -m 64 -p
>11211 -u root", and write code to connect to the 11211 port to write the set
>operation data. Iterate 100 times to set a 9000 bytes value each time. The
>client.c code prepares all the data and write the data once. The
>client_multi_write.c code splite the data to two writes. Run the two
>programs to get the above result.
>   I think it's partly related to the implement of the memcached? Anybody to
>confirm the above experiment or give me an explanation?

Here is tcpdump I got under FreeBSD 6.2:

02:59:16.390023 IP localhost.54085 > localhost.11211: P 72193:80389(8196) ack 65 win 35840 <nop,nop,timestamp 45948689 45948689>
02:59:16.490168 IP localhost.11211 > localhost.54085: . ack 80389 win 35840 <nop,nop,timestamp 45948699 45948689>
02:59:16.490172 IP localhost.54085 > localhost.11211: P 80389:81217(828) ack 65 win 35840 <nop,nop,timestamp 45948699 45948699>
02:59:16.490184 IP localhost.11211 > localhost.54085: P 65:73(8) ack 81217 win 35840 <nop,nop,timestamp 45948699 45948699>

You may note that first packet was ack'ed after 100ms - 
default delayed ack timeout in FreeBSD.  Since your next packet is 
small - OS waits before sending it, and sends only after previous 
packet was ack'ed.  And since memcached don't get the whole 
request, it has nothing to answer - so ack was sent only after timeout.

Setting net.inet.tcp.delayed_ack to 0 makes times much more real:

$ time ./client > /dev/null

real    0m0.077s
user    0m0.032s
sys     0m0.003s

$ time ./client_multi_write > /dev/null

real    0m0.110s
user    0m0.075s
sys     0m0.003s

I.e. both code variants are almost identical.

To fix this correctly without switching off delayed ack you should 
use TCP_NODELAY in your client.

You may also consider using writev(2) instead of multiple write() 
calls, or setting TCP_NOPUSH (under FreeBSD) / TCP_CORK (under 
Linux) and switching them off after all data was written. This is 
not directly related to the problem above, but will help reducing 
number of packets sent over wire - it's important since TCP_NODELAY 
will switch off normal OS's packet aggregation mechanisms.

Maxim Dounin

More information about the memcached mailing list