PECL memcache extension

Fri Mar 10 21:40:35 UTC 2006

Hi Mikael,

I'm getting a segfault when a server fails and I try to set status. 
Example code:

$memCache = new Memcache();

$memCache->addServer("192.168.0.1", "10000", 1, 1, 1, 15, true, 
'memcacheFailCallback');
$memCache->add("test", "value");

function memcacheFailCallback($Host, $Port) {
	global $memCache;
	$memCache->setServerParams($Host, $Port, 1, -1, false, 
'memcacheFailCallback');
}

Any ideas?  Am I doing something wrong?

Thanks,

Don

Mikael Johansson wrote:
> There's now a snapshot ready that should support your usecase, I would 
> like some feedback as to the suitability of the changes before commiting 
> to the API. Could you have a look at the changes and try them out in 
> your environment?
> 
> In summary; servers may be added in failed mode and by setting 
> retry_interval to -1 you can prevent them from ever being used. Use 
> setServerParams() with retry_interval = 15, status = true to bring a 
> specific server back online. The status of a server can be fetched using 
> getServerStatus(). By providing a failure callback, user code can mark 
> the server as down in the external database until some async job flushes 
> it and flags it as back online. ini_set('memcache.allow_failover', 0); 
> will prevent failover from occuring.
> 
> Code available at
> http://www.synd.info/extensions/memcache/
> 
> Memcache::addServer(host, port = 11211, persistent = true, weight = 1, 
> timeout = 1, retry_interval = 15, status = true, failure_callback = 
> NULL) : bool
>  * Setting retry_interval to -1 disables automatic reconnection of 
> failed hosts
>  * Setting status to false marks the server as failed
>  * failure_callback is run when an operation on a server fails for some 
> reason, implement as myCallback(string host, int port) : void. Both 
> function (eg. 'my_memcache_callback') or OO callbacks (eg. array(new 
> FailureHandler(), 'onfailure')) are supported
> 
> Memcache::setServerParams(host, port = 11211, timeout = 1, 
> retry_interval = 15, status = true, failure_callback = NULL) : bool
>  * Allows for changing parameters at runtime, such as bringing the 
> server online/offline
> 
> Memcache::getServerStatus(host, port = 11211) : int
>  * Returns a non-zero status if the server is online, 0 if the server is 
> marked as failed
> 
> The suggested Memcache::checkState(host, port, key, val) can be 
> implemented in userland by connecting to the host separately and 
> fetching the test key.
> 
> //Mikael
> 
> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
> To: "Mikael Johansson" <mikael at synd.info>
> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony Dovgal" 
> <antony at zend.com>
> Sent: Sunday, February 05, 2006 8:49 PM
> Subject: Re: PECL memcache extension
> 
> 
>>
>> Wow, this sounds great!  Thanks!
>>
>> My only concern now (pending testing) is with the flushing.  I realize 
>> why you backed out the callback, which makes a lot of sense, but I'm 
>> still struggling to figure out how to flush my server when it comes back.
>>
>> If I use an external service monitor, there's still a race condition - 
>> if the PECL extension detects that the server is back first with it's 
>> retries, it will use it for awhile before the service monitor realizes 
>> it's back and flushes it.
>>
>> Even a second or two of stale data reads could be fatal for us.
>>
>> The immediate and obvious answer would be to do external state 
>> tracking and somehow let the PECL extension know what's going on.   It 
>> may only be immediate and obvious to me since we're doing it already, 
>> but I'm certainly open to other suggestions.  Here's what I'd need:
>>
>> - Access to view and change the state of the server pool so we can 
>> externally set a server back to "up" (or "down") rather than 
>> automatically checking every retry seconds.  Maybe:
>>
>> - Memcache::addServer(host, port, persistent, weight, timeout, 
>> retry_interval, state)  [I'd set retry_interval to 0 to disable 
>> retries, in this case]
>>
>> - Memcache::setState(host, port, state)
>>
>> - Memcache::getState(host, port)
>>
>> - Memcache::checkState(host, port, key, val)  [theoretically, this 
>> would connect to the server, set the key and get the key]
>>
>> - A callback for a server failure would be great, but could maybe be 
>> done by checking the return of the various functions instead.
>>
>> With my current implementation, I already do this for both up/down 
>> state changes, and a server isn't marked as "up" again until a 
>> successful flush_all() has happened.  Since it's externally tracked in 
>> a database, each Apache child is fed the real state each time a script 
>> is run or the state needs updating.
>>
>> If this is getting too messy, or not useful enough to others, I can 
>> instead just not use the extension's pooling mechanism and instead 
>> wrap my own stuff around Memcache::connect() and/or 
>> Memcache::pconnect(). But I can't imagine I'm the only one who's 
>> paranoid about data integrity issues. :)
>>
>> I'd be happy to release our PHP code for external tracking in either 
>> event, if anyone's interested.  It's not really very difficult, but it 
>> might help those who haven't done it in a decent-sized production 
>> environment yet.
>>
>> And, again, if there's an easier way to do this and preserve data 
>> integrity, I'd love to hear it.  :)
>>
>> Don
>>
>>
>> Mikael Johansson wrote:
>>> There's now a "memcache.allow_failover" ini directive in CVS which 
>>> you can use to prevent failover and make the client code return false 
>>> immediatly, defaults to true.
>>>
>>> Failover may occur at any stage in any of the methods that talk to 
>>> the server (set, get, delete, increment, ..) and as long as there are 
>>> other servers available the client code won't notice (other than a 
>>> E_NOTICE being triggered.) Causes that would trigger a failover might 
>>> be socket connect failures, read/write errors or Memcached server 
>>> errors (other than out-of-memory.)
>>>
>>> Each persistent connection struct has its own retry timeout which 
>>> gets set when some failure occur, after it expires the connection 
>>> will be retried and possibly marked failed for another retry_interval 
>>> seconds. Since each Apache child might have a connection struct of 
>>> their own each child would attempt to reconnect every interval 
>>> seconds when serving a request.
>>>
>>> The changes needed to allow a user to specifiy a callback to be run 
>>> on failback was minor; but since each child on every host might run 
>>> it when they reconnect a failed connection struct the results were 
>>> somewhat unreliable. There's also the very real possibility that the 
>>> child creates a completly new struct even though persistent connect 
>>> was specified (for example when the connection pool is exhausted) and 
>>> thus doesn't run the callback at all. In any case; I backed out those 
>>> changes and would recommend using a real service monitor (such as 
>>> "mon") instead, to flush failed servers when they come back online.
>>>
>>> //Mikael
>>>
>>> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
>>> To: "Mikael Johansson" <mikael at synd.info>
>>> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony 
>>> Dovgal" <antony at zend.com>
>>> Sent: Saturday, February 04, 2006 8:07 PM
>>> Subject: Re: PECL memcache extension
>>>
>>>
>>>>
>>>> Sounds like we're on the same page as far as understanding the 
>>>> problem. And I'd definitely like a flag to be able to automatically 
>>>> flush_all() the server which just re-joined the cluster (or even no 
>>>> option, though I might be missing a scenario where you wouldn't want 
>>>> this).
>>>>
>>>> But rather than having to do a flush_all() on every member of the 
>>>> cluster when #2 happens, I'd much rather see something like a 
>>>> php.ini parameter that lets me tell memcache not to rebalance the 
>>>> cluster when one fails:
>>>>
>>>> memcache.rebalance = false
>>>>
>>>> I have enough memcache servers that a failure of one of them doesn't 
>>>> dramatically affect performance.  But having stale data, or having 
>>>> to flush_all() every server would be a Big Deal.
>>>>
>>>> I suppose I could just write a wrapper for memcache in PHP that 
>>>> handles failure scenarios and not use memcache:addServer() at all if 
>>>> this doesn't sound feasible.
>>>>
>>>> Also, I'd love to get a little insight into exactly what happens 
>>>> when a failure occurs.  What causes memcache to consider a server to 
>>>> be a failure?  Is it only if a socket connect fails?  Or does a 
>>>> failure of some of the commands (delete, for example) also cause a 
>>>> server to be marked as failed?
>>>>
>>>>
>>>> And finally, I see that there's a retry timer.  Is that global for 
>>>> the entire Apache process?  Or just a thread/fork?  If I set it to 
>>>> be 60 seconds or something, does that mean there will only be a 
>>>> single retry every 60 seconds for the entire physical server running 
>>>> Apache?  Or are all the threads/forks going to retry every 60 
>>>> seconds?  I want to make sure we're not retrying so frequently that 
>>>> we're causing it to flap.
>>>>
>>>> A little bit better documentation in this regard would help, but 
>>>> perhaps providing some mechanism where the application can mark a 
>>>> server in the cluster as failed at will would be nice, too.  And is 
>>>> there any way to notify (via php's error log, or firing a function 
>>>> or something) me when a server fails?
>>>>
>>>> Thanks,
>>>>
>>>> Don