PECL memcache extension

Mon Mar 6 23:08:45 UTC 2006

Sweet!  I'll definitely set aside some time (soon!) to test this and get 
back to you.

Everything sounds very sane and like I'll be able to do exactly what I 
wanted to do.

Thanks so much!

Don

Mikael Johansson wrote:
> There's now a snapshot ready that should support your usecase, I would 
> like some feedback as to the suitability of the changes before commiting 
> to the API. Could you have a look at the changes and try them out in 
> your environment?
> 
> In summary; servers may be added in failed mode and by setting 
> retry_interval to -1 you can prevent them from ever being used. Use 
> setServerParams() with retry_interval = 15, status = true to bring a 
> specific server back online. The status of a server can be fetched using 
> getServerStatus(). By providing a failure callback, user code can mark 
> the server as down in the external database until some async job flushes 
> it and flags it as back online. ini_set('memcache.allow_failover', 0); 
> will prevent failover from occuring.
> 
> Code available at
> http://www.synd.info/extensions/memcache/
> 
> Memcache::addServer(host, port = 11211, persistent = true, weight = 1, 
> timeout = 1, retry_interval = 15, status = true, failure_callback = 
> NULL) : bool
>  * Setting retry_interval to -1 disables automatic reconnection of 
> failed hosts
>  * Setting status to false marks the server as failed
>  * failure_callback is run when an operation on a server fails for some 
> reason, implement as myCallback(string host, int port) : void. Both 
> function (eg. 'my_memcache_callback') or OO callbacks (eg. array(new 
> FailureHandler(), 'onfailure')) are supported
> 
> Memcache::setServerParams(host, port = 11211, timeout = 1, 
> retry_interval = 15, status = true, failure_callback = NULL) : bool
>  * Allows for changing parameters at runtime, such as bringing the 
> server online/offline
> 
> Memcache::getServerStatus(host, port = 11211) : int
>  * Returns a non-zero status if the server is online, 0 if the server is 
> marked as failed
> 
> The suggested Memcache::checkState(host, port, key, val) can be 
> implemented in userland by connecting to the host separately and 
> fetching the test key.
> 
> //Mikael
> 
> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
> To: "Mikael Johansson" <mikael at synd.info>
> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony Dovgal" 
> <antony at zend.com>
> Sent: Sunday, February 05, 2006 8:49 PM
> Subject: Re: PECL memcache extension
> 
> 
>>
>> Wow, this sounds great!  Thanks!
>>
>> My only concern now (pending testing) is with the flushing.  I realize 
>> why you backed out the callback, which makes a lot of sense, but I'm 
>> still struggling to figure out how to flush my server when it comes back.
>>
>> If I use an external service monitor, there's still a race condition - 
>> if the PECL extension detects that the server is back first with it's 
>> retries, it will use it for awhile before the service monitor realizes 
>> it's back and flushes it.
>>
>> Even a second or two of stale data reads could be fatal for us.
>>
>> The immediate and obvious answer would be to do external state 
>> tracking and somehow let the PECL extension know what's going on.   It 
>> may only be immediate and obvious to me since we're doing it already, 
>> but I'm certainly open to other suggestions.  Here's what I'd need:
>>
>> - Access to view and change the state of the server pool so we can 
>> externally set a server back to "up" (or "down") rather than 
>> automatically checking every retry seconds.  Maybe:
>>
>> - Memcache::addServer(host, port, persistent, weight, timeout, 
>> retry_interval, state)  [I'd set retry_interval to 0 to disable 
>> retries, in this case]
>>
>> - Memcache::setState(host, port, state)
>>
>> - Memcache::getState(host, port)
>>
>> - Memcache::checkState(host, port, key, val)  [theoretically, this 
>> would connect to the server, set the key and get the key]
>>
>> - A callback for a server failure would be great, but could maybe be 
>> done by checking the return of the various functions instead.
>>
>> With my current implementation, I already do this for both up/down 
>> state changes, and a server isn't marked as "up" again until a 
>> successful flush_all() has happened.  Since it's externally tracked in 
>> a database, each Apache child is fed the real state each time a script 
>> is run or the state needs updating.
>>
>> If this is getting too messy, or not useful enough to others, I can 
>> instead just not use the extension's pooling mechanism and instead 
>> wrap my own stuff around Memcache::connect() and/or 
>> Memcache::pconnect(). But I can't imagine I'm the only one who's 
>> paranoid about data integrity issues. :)
>>
>> I'd be happy to release our PHP code for external tracking in either 
>> event, if anyone's interested.  It's not really very difficult, but it 
>> might help those who haven't done it in a decent-sized production 
>> environment yet.
>>
>> And, again, if there's an easier way to do this and preserve data 
>> integrity, I'd love to hear it.  :)
>>
>> Don
>>
>>
>> Mikael Johansson wrote:
>>> There's now a "memcache.allow_failover" ini directive in CVS which 
>>> you can use to prevent failover and make the client code return false 
>>> immediatly, defaults to true.
>>>
>>> Failover may occur at any stage in any of the methods that talk to 
>>> the server (set, get, delete, increment, ..) and as long as there are 
>>> other servers available the client code won't notice (other than a 
>>> E_NOTICE being triggered.) Causes that would trigger a failover might 
>>> be socket connect failures, read/write errors or Memcached server 
>>> errors (other than out-of-memory.)
>>>
>>> Each persistent connection struct has its own retry timeout which 
>>> gets set when some failure occur, after it expires the connection 
>>> will be retried and possibly marked failed for another retry_interval 
>>> seconds. Since each Apache child might have a connection struct of 
>>> their own each child would attempt to reconnect every interval 
>>> seconds when serving a request.
>>>
>>> The changes needed to allow a user to specifiy a callback to be run 
>>> on failback was minor; but since each child on every host might run 
>>> it when they reconnect a failed connection struct the results were 
>>> somewhat unreliable. There's also the very real possibility that the 
>>> child creates a completly new struct even though persistent connect 
>>> was specified (for example when the connection pool is exhausted) and 
>>> thus doesn't run the callback at all. In any case; I backed out those 
>>> changes and would recommend using a real service monitor (such as 
>>> "mon") instead, to flush failed servers when they come back online.
>>>
>>> //Mikael
>>>
>>> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
>>> To: "Mikael Johansson" <mikael at synd.info>
>>> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony 
>>> Dovgal" <antony at zend.com>
>>> Sent: Saturday, February 04, 2006 8:07 PM
>>> Subject: Re: PECL memcache extension
>>>
>>>
>>>>
>>>> Sounds like we're on the same page as far as understanding the 
>>>> problem. And I'd definitely like a flag to be able to automatically 
>>>> flush_all() the server which just re-joined the cluster (or even no 
>>>> option, though I might be missing a scenario where you wouldn't want 
>>>> this).
>>>>
>>>> But rather than having to do a flush_all() on every member of the 
>>>> cluster when #2 happens, I'd much rather see something like a 
>>>> php.ini parameter that lets me tell memcache not to rebalance the 
>>>> cluster when one fails:
>>>>
>>>> memcache.rebalance = false
>>>>
>>>> I have enough memcache servers that a failure of one of them doesn't 
>>>> dramatically affect performance.  But having stale data, or having 
>>>> to flush_all() every server would be a Big Deal.
>>>>
>>>> I suppose I could just write a wrapper for memcache in PHP that 
>>>> handles failure scenarios and not use memcache:addServer() at all if 
>>>> this doesn't sound feasible.
>>>>
>>>> Also, I'd love to get a little insight into exactly what happens 
>>>> when a failure occurs.  What causes memcache to consider a server to 
>>>> be a failure?  Is it only if a socket connect fails?  Or does a 
>>>> failure of some of the commands (delete, for example) also cause a 
>>>> server to be marked as failed?
>>>>
>>>>
>>>> And finally, I see that there's a retry timer.  Is that global for 
>>>> the entire Apache process?  Or just a thread/fork?  If I set it to 
>>>> be 60 seconds or something, does that mean there will only be a 
>>>> single retry every 60 seconds for the entire physical server running 
>>>> Apache?  Or are all the threads/forks going to retry every 60 
>>>> seconds?  I want to make sure we're not retrying so frequently that 
>>>> we're causing it to flap.
>>>>
>>>> A little bit better documentation in this regard would help, but 
>>>> perhaps providing some mechanism where the application can mark a 
>>>> server in the cluster as failed at will would be nice, too.  And is 
>>>> there any way to notify (via php's error log, or firing a function 
>>>> or something) me when a server fails?
>>>>
>>>> Thanks,
>>>>
>>>> Don