PECL memcache extension

Sun Mar 5 19:12:58 UTC 2006

There's now a snapshot ready that should support your usecase, I would like 
some feedback as to the suitability of the changes before commiting to the 
API. Could you have a look at the changes and try them out in your 
environment?

In summary; servers may be added in failed mode and by setting 
retry_interval to -1 you can prevent them from ever being used. Use 
setServerParams() with retry_interval = 15, status = true to bring a 
specific server back online. The status of a server can be fetched using 
getServerStatus(). By providing a failure callback, user code can mark the 
server as down in the external database until some async job flushes it and 
flags it as back online. ini_set('memcache.allow_failover', 0); will prevent 
failover from occuring.

Code available at
 http://www.synd.info/extensions/memcache/

 Memcache::addServer(host, port = 11211, persistent = true, weight = 1, 
timeout = 1, retry_interval = 15, status = true, failure_callback = NULL) : 
bool
  * Setting retry_interval to -1 disables automatic reconnection of failed 
hosts
  * Setting status to false marks the server as failed
  * failure_callback is run when an operation on a server fails for some 
reason, implement as myCallback(string host, int port) : void. Both function 
(eg. 'my_memcache_callback') or OO callbacks (eg. array(new 
FailureHandler(), 'onfailure')) are supported

 Memcache::setServerParams(host, port = 11211, timeout = 1, retry_interval = 
15, status = true, failure_callback = NULL) : bool
  * Allows for changing parameters at runtime, such as bringing the server 
online/offline

 Memcache::getServerStatus(host, port = 11211) : int
  * Returns a non-zero status if the server is online, 0 if the server is 
marked as failed

The suggested Memcache::checkState(host, port, key, val) can be implemented 
in userland by connecting to the host separately and fetching the test key.

//Mikael

----- Original Message ----- 
From: "Don MacAskill" <don at smugmug.com>
To: "Mikael Johansson" <mikael at synd.info>
Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony Dovgal" 
<antony at zend.com>
Sent: Sunday, February 05, 2006 8:49 PM
Subject: Re: PECL memcache extension

>
> Wow, this sounds great!  Thanks!
>
> My only concern now (pending testing) is with the flushing.  I realize why 
> you backed out the callback, which makes a lot of sense, but I'm still 
> struggling to figure out how to flush my server when it comes back.
>
> If I use an external service monitor, there's still a race condition - if 
> the PECL extension detects that the server is back first with it's 
> retries, it will use it for awhile before the service monitor realizes 
> it's back and flushes it.
>
> Even a second or two of stale data reads could be fatal for us.
>
> The immediate and obvious answer would be to do external state tracking 
> and somehow let the PECL extension know what's going on.   It may only be 
> immediate and obvious to me since we're doing it already, but I'm 
> certainly open to other suggestions.  Here's what I'd need:
>
> - Access to view and change the state of the server pool so we can 
> externally set a server back to "up" (or "down") rather than automatically 
> checking every retry seconds.  Maybe:
>
> - Memcache::addServer(host, port, persistent, weight, timeout, 
> retry_interval, state)  [I'd set retry_interval to 0 to disable retries, 
> in this case]
>
> - Memcache::setState(host, port, state)
>
> - Memcache::getState(host, port)
>
> - Memcache::checkState(host, port, key, val)  [theoretically, this would 
> connect to the server, set the key and get the key]
>
> - A callback for a server failure would be great, but could maybe be done 
> by checking the return of the various functions instead.
>
> With my current implementation, I already do this for both up/down state 
> changes, and a server isn't marked as "up" again until a successful 
> flush_all() has happened.  Since it's externally tracked in a database, 
> each Apache child is fed the real state each time a script is run or the 
> state needs updating.
>
> If this is getting too messy, or not useful enough to others, I can 
> instead just not use the extension's pooling mechanism and instead wrap my 
> own stuff around Memcache::connect() and/or Memcache::pconnect(). But I 
> can't imagine I'm the only one who's paranoid about data integrity issues. 
> :)
>
> I'd be happy to release our PHP code for external tracking in either 
> event, if anyone's interested.  It's not really very difficult, but it 
> might help those who haven't done it in a decent-sized production 
> environment yet.
>
> And, again, if there's an easier way to do this and preserve data 
> integrity, I'd love to hear it.  :)
>
> Don
>
>
> Mikael Johansson wrote:
>> There's now a "memcache.allow_failover" ini directive in CVS which you 
>> can use to prevent failover and make the client code return false 
>> immediatly, defaults to true.
>>
>> Failover may occur at any stage in any of the methods that talk to the 
>> server (set, get, delete, increment, ..) and as long as there are other 
>> servers available the client code won't notice (other than a E_NOTICE 
>> being triggered.) Causes that would trigger a failover might be socket 
>> connect failures, read/write errors or Memcached server errors (other 
>> than out-of-memory.)
>>
>> Each persistent connection struct has its own retry timeout which gets 
>> set when some failure occur, after it expires the connection will be 
>> retried and possibly marked failed for another retry_interval seconds. 
>> Since each Apache child might have a connection struct of their own each 
>> child would attempt to reconnect every interval seconds when serving a 
>> request.
>>
>> The changes needed to allow a user to specifiy a callback to be run on 
>> failback was minor; but since each child on every host might run it when 
>> they reconnect a failed connection struct the results were somewhat 
>> unreliable. There's also the very real possibility that the child creates 
>> a completly new struct even though persistent connect was specified (for 
>> example when the connection pool is exhausted) and thus doesn't run the 
>> callback at all. In any case; I backed out those changes and would 
>> recommend using a real service monitor (such as "mon") instead, to flush 
>> failed servers when they come back online.
>>
>> //Mikael
>>
>> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
>> To: "Mikael Johansson" <mikael at synd.info>
>> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony Dovgal" 
>> <antony at zend.com>
>> Sent: Saturday, February 04, 2006 8:07 PM
>> Subject: Re: PECL memcache extension
>>
>>
>>>
>>> Sounds like we're on the same page as far as understanding the problem. 
>>> And I'd definitely like a flag to be able to automatically flush_all() 
>>> the server which just re-joined the cluster (or even no option, though I 
>>> might be missing a scenario where you wouldn't want this).
>>>
>>> But rather than having to do a flush_all() on every member of the 
>>> cluster when #2 happens, I'd much rather see something like a php.ini 
>>> parameter that lets me tell memcache not to rebalance the cluster when 
>>> one fails:
>>>
>>> memcache.rebalance = false
>>>
>>> I have enough memcache servers that a failure of one of them doesn't 
>>> dramatically affect performance.  But having stale data, or having to 
>>> flush_all() every server would be a Big Deal.
>>>
>>> I suppose I could just write a wrapper for memcache in PHP that handles 
>>> failure scenarios and not use memcache:addServer() at all if this 
>>> doesn't sound feasible.
>>>
>>> Also, I'd love to get a little insight into exactly what happens when a 
>>> failure occurs.  What causes memcache to consider a server to be a 
>>> failure?  Is it only if a socket connect fails?  Or does a failure of 
>>> some of the commands (delete, for example) also cause a server to be 
>>> marked as failed?
>>>
>>>
>>> And finally, I see that there's a retry timer.  Is that global for the 
>>> entire Apache process?  Or just a thread/fork?  If I set it to be 60 
>>> seconds or something, does that mean there will only be a single retry 
>>> every 60 seconds for the entire physical server running Apache?  Or are 
>>> all the threads/forks going to retry every 60 seconds?  I want to make 
>>> sure we're not retrying so frequently that we're causing it to flap.
>>>
>>> A little bit better documentation in this regard would help, but perhaps 
>>> providing some mechanism where the application can mark a server in the 
>>> cluster as failed at will would be nice, too.  And is there any way to 
>>> notify (via php's error log, or firing a function or something) me when 
>>> a server fails?
>>>
>>> Thanks,
>>>
>>> Don