I never saw a reply to this question on the list, so this is a *bump*...<br><br><div><span class="gmail_quote">On 12/4/06, <b class="gmail_sendername">matt DiMeo</b> &lt;<a href="mailto:mattdimeo@yahoo.com">mattdimeo@yahoo.com

</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">There&#39;s a minor (depending on application) reliability issue with the perl client (I have version 

1.18). Basically if a host dies, every request to it will take one second to timeout.&nbsp; The client is smart about marking a memcached as down if it can&#39;t initally connect to it, but when it already has a connection, it&#39;ll just keep retrying.

This is because the _dead_sock sub is only passed a $dead_for parameter (20+rand seconds) on initial connect; if an already-connected socket fails, it gets undef and doesn&#39;t mark the host bad. You can simulate this behavior by running memcached in non-daemon mode and hitting ctrl-z; every request to that server will then take a full second to fail (the select timeout is 1s).

<br><br>Most applications would probably just slow down, but our system can&#39;t tolerate average response times of over about 400ms (we fill up all our worker processes), so we kinda fall over.<br><br>I&#39;ve fixed this locally by modifying all the 

_dead_sock calls to pass an appropriate $dead_for value. I&#39;m also planning to add support for optionally replacing the %host_dead hash with an IPC::ShareLite object, which would allow the information about which hosts are up or down to be shared between clients on the same host.

So, a couple questions: 1. Can anyone tell me why _dead_sock isn&#39;t always passed a $down_for time? 2. If not, let&#39;s fix it (which isn&#39;t actually a question). 3. Would anyone else be interested in the IPC::ShareLite stuff?

Thanks, <span class="sg">-m@</blockquote></div> --  Ready!! Fire!! Aim!!