I never saw a reply to this question on the list, so this is a *bump*...<br><br><div><span class="gmail_quote">On 12/4/06, <b class="gmail_sendername">matt DiMeo</b> <<a href="mailto:mattdimeo@yahoo.com">mattdimeo@yahoo.com
</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">There's a minor (depending on application) reliability issue with the perl client (I have version
1.18).<br><br>Basically if a host dies, every request to it will take one second to timeout. The client is smart about marking a memcached as down if it can't initally connect to it, but when it already has a connection, it'll just keep retrying.
<br><br>This is because the _dead_sock sub is only passed a $dead_for parameter (20+rand seconds) on initial connect; if an already-connected socket fails, it gets undef and doesn't mark the host bad.<br><br>You can simulate this behavior by running memcached in non-daemon mode and hitting ctrl-z; every request to that server will then take a full second to fail (the select timeout is 1s).
<br><br>Most applications would probably just slow down, but our system can't tolerate average response times of over about 400ms (we fill up all our worker processes), so we kinda fall over.<br><br>I've fixed this locally by modifying all the
_dead_sock calls to pass an appropriate $dead_for value.<br><br>I'm also planning to add support for optionally replacing the %host_dead hash with an IPC::ShareLite object, which would allow the information about which hosts are up or down to be shared between clients on the same host.
<br><br>So, a couple questions:<br>1. Can anyone tell me why _dead_sock isn't always passed a $down_for time?<br>2. If not, let's fix it (which isn't actually a question).<br>3. Would anyone else be interested in the IPC::ShareLite stuff?
<br><br>Thanks,<br><span class="sg">-m@</span></blockquote></div><br>-- <br>Ready!!<br>Fire!!<br>Aim!!