'noreply' over the wire

Sat Nov 10 10:43:33 UTC 2007

On Fri, Nov 09, 2007 at 21:10:18 +0300, Tomash Brechko wrote:
> > But these seem gratuitous.  Are you actually testing them in a
> > production environment?
> 
> The truth is: no, I don't.  I don't have the right environment.  Can
> you help me with that?

Let me bug you a bit more.  Though I don't have the access to any
production environment, I still do the testing that I can.  Previously
I posted some measurements with loopback interface, here comes the
test over the wire.  I have 1.1 machines, 1 being desktop Pentium 4 at
2.4GHz, and 0.1 being a router with Broadcom CPU at 264MHz.  The
router is capable enough, it does 100Mb/s over 5 ports.

I can't run serious code on the router, but thankfully it runs Linux,
so I install the following rule to iptables:

  iptables -t mangle -A PREROUTING -p all \
           -s 192.168.254.0/24 -d 192.168.250.254 -j MIRROR

i.e. every packet sent to 192.168.250.254 is mirrored back with source
and destination addresses (but not ports) transposed.  This means that
I can connect two processes on desktop with TCP/UDP that goes to the
wire and back.

For the test 04_noreply.t that I added to Cache::Memcached for count
=> 100_000 iterations in every loops, times were

  Files=1, Tests=7, 19 wallclock secs (16.87 cusr +  1.96 csys = 18.83 CPU)

and network utilization was 1.8-1.9MB/s (B is _byte_, not bit).  As
you can see, 19 wallclock =~ 18.83 CPU, so the case was actually CPU
bound (and xosview showed that too), i.e. it could probably be even
faster.  Cache::Memcached adds quite some overhead, and there are ways
to improve it (for instance, set() copies data _twice_ before sending
it, for not reason.  This may be replaced with writev binding (I hope
there's one in Perl)).

Then for every request in the test file I added $res = ... to suppress
void context and effectively to wait for the result packet.  Now times
were

  Files=1, Tests=7, 207 wallclock secs (38.78 cusr +  9.00 csys = 47.78 CPU)

CPU usage grew about two times, which is fair given that we have twice
as much traffic now.  But network utilization was only 510-520KB/s,
and what's more important is that wallclock is _ten_ times bigger.

The explanation is simple: network utilization is 4 times smaller, and
we have twice as much traffic, which give 8, that is close to 10.
Such poor network load comes from the fact that when traffic goes only
one direction several requests may end up in one network packet
(kernel doesn't send packets right away, it waits a bit for more data,
and sometimes it also has to wait for the interface to become ready).
But with request-reply scheme every request (and reply) goes in a
separate packet.

Alright, artificial case looks good, but will this work in production?
Sure the mileage depends on how frequent the requests with 'noreply'
are.  More interesting question is _when_ it is safe to use 'noreply'.

Let's have a quick tour through the commands.  I will assume that the
request is syntactically correct, and there's no "internal server
error" (like generic ERROR, more on this later).  I will also assume
that client code is semantically correct, for instance, when it issues
'add', it either can't get EXISTS, or this would mean a success.
Application that has to handle such semantic inconsistency should test
the result of course.

  - get, gets, stats, version: do not support 'noreply' for obvious
    reasons.

  - quit: never had a result anyway.

  - add: STORED: success, EXISTS: can't happen if semantically
    correct, or also means a success.

  - set: STORED: success.

  - replace, append, prepend, cas: STORED: success, NOT_FOUND: can't
    happen if semantically correct, or also means a success.

  - delete: DELETED: success, NOT_FOUND: success.

  - incr, decr: <new_value>: success, normally not needed, NOT_FOUND:
    can't happen if semantically correct.

  - flush_all, verbosity: always OK: success.

As you can see, you may ignore the result more often than not.  The
remaining question is: what if there's an internal server error?  Good
candidate is "not enough memory".  The answer is: for some commands,
it can't be helped.  For some ambiguity of the resulting state is
acceptable.  For some minor faulty rate is acceptable (for instance,
for the web application the user may always reload the page).  When it
really matters client should test the result.  IOW, client should use
best judgment.

This very moment the idea crossed my mind: we may add the flag,
'noreply_close_on_error', and if set the server will close the
connection when 'noreply' was used, and there was an internal error.
This way, the client will never miss the error.

IOW, "What I say three times is true" :).  Let's accept the patches to
the mainline.  Thanks in advance!

-- 
   Tomash Brechko