perl

Mon Dec 4 23:17:50 UTC 2006

There's a minor (depending on application) reliability issue with the perl client (I have version 1.18).

Basically if a host dies, every request to it will take one second to timeout.  The client is smart about marking a memcached as down if it can't initally connect to it, but when it already has a connection, it'll just keep retrying.

This is because the _dead_sock sub is only passed a $dead_for parameter (20+rand seconds) on initial connect; if an already-connected socket fails, it gets undef and doesn't mark the host bad.

You can simulate this behavior by running memcached in non-daemon mode and hitting ctrl-z; every request to that server will then take a full second to fail (the select timeout is 1s).

Most applications would probably just slow down, but our system can't tolerate average response times of over about 400ms (we fill up all our worker processes), so we kinda fall over.

I've fixed this locally by modifying all the  _dead_sock calls to pass an appropriate $dead_for value.

I'm also planning to add support for optionally replacing the %host_dead hash with an IPC::ShareLite object, which would allow the information about which hosts are up or down to be shared between clients on the same host.

So, a couple questions:
1. Can anyone tell me why _dead_sock isn't always passed a $down_for time?
2. If not, let's fix it (which isn't actually a question).
3. Would anyone else be interested in the IPC::ShareLite stuff?

Thanks,
-m@

---------------------------------
Any questions?  Get answers on any topic at Yahoo! Answers. Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20061204/77e0a0d8/attachment.htm