PECL memcache extension

Sun Feb 5 19:49:17 UTC 2006

Wow, this sounds great!  Thanks!

My only concern now (pending testing) is with the flushing.  I realize 
why you backed out the callback, which makes a lot of sense, but I'm 
still struggling to figure out how to flush my server when it comes back.

If I use an external service monitor, there's still a race condition - 
if the PECL extension detects that the server is back first with it's 
retries, it will use it for awhile before the service monitor realizes 
it's back and flushes it.

Even a second or two of stale data reads could be fatal for us.

The immediate and obvious answer would be to do external state tracking 
and somehow let the PECL extension know what's going on.   It may only 
be immediate and obvious to me since we're doing it already, but I'm 
certainly open to other suggestions.  Here's what I'd need:

- Access to view and change the state of the server pool so we can 
externally set a server back to "up" (or "down") rather than 
automatically checking every retry seconds.  Maybe:

- Memcache::addServer(host, port, persistent, weight, timeout, 
retry_interval, state)  [I'd set retry_interval to 0 to disable retries, 
in this case]

- Memcache::setState(host, port, state)

- Memcache::getState(host, port)

- Memcache::checkState(host, port, key, val)  [theoretically, this would 
connect to the server, set the key and get the key]

- A callback for a server failure would be great, but could maybe be 
done by checking the return of the various functions instead.

With my current implementation, I already do this for both up/down state 
changes, and a server isn't marked as "up" again until a successful 
flush_all() has happened.  Since it's externally tracked in a database, 
each Apache child is fed the real state each time a script is run or the 
state needs updating.

If this is getting too messy, or not useful enough to others, I can 
instead just not use the extension's pooling mechanism and instead wrap 
my own stuff around Memcache::connect() and/or Memcache::pconnect(). 
But I can't imagine I'm the only one who's paranoid about data integrity 
issues.  :)

I'd be happy to release our PHP code for external tracking in either 
event, if anyone's interested.  It's not really very difficult, but it 
might help those who haven't done it in a decent-sized production 
environment yet.

And, again, if there's an easier way to do this and preserve data 
integrity, I'd love to hear it.  :)

Don

Mikael Johansson wrote:
> There's now a "memcache.allow_failover" ini directive in CVS which you 
> can use to prevent failover and make the client code return false 
> immediatly, defaults to true.
> 
> Failover may occur at any stage in any of the methods that talk to the 
> server (set, get, delete, increment, ..) and as long as there are other 
> servers available the client code won't notice (other than a E_NOTICE 
> being triggered.) Causes that would trigger a failover might be socket 
> connect failures, read/write errors or Memcached server errors (other 
> than out-of-memory.)
> 
> Each persistent connection struct has its own retry timeout which gets 
> set when some failure occur, after it expires the connection will be 
> retried and possibly marked failed for another retry_interval seconds. 
> Since each Apache child might have a connection struct of their own each 
> child would attempt to reconnect every interval seconds when serving a 
> request.
> 
> The changes needed to allow a user to specifiy a callback to be run on 
> failback was minor; but since each child on every host might run it when 
> they reconnect a failed connection struct the results were somewhat 
> unreliable. There's also the very real possibility that the child 
> creates a completly new struct even though persistent connect was 
> specified (for example when the connection pool is exhausted) and thus 
> doesn't run the callback at all. In any case; I backed out those changes 
> and would recommend using a real service monitor (such as "mon") 
> instead, to flush failed servers when they come back online.
> 
> //Mikael
> 
> ----- Original Message ----- From: "Don MacAskill" <don at smugmug.com>
> To: "Mikael Johansson" <mikael at synd.info>
> Cc: "memcached mail list" <memcached at lists.danga.com>; "Antony Dovgal" 
> <antony at zend.com>
> Sent: Saturday, February 04, 2006 8:07 PM
> Subject: Re: PECL memcache extension
> 
> 
>>
>> Sounds like we're on the same page as far as understanding the 
>> problem. And I'd definitely like a flag to be able to automatically 
>> flush_all() the server which just re-joined the cluster (or even no 
>> option, though I might be missing a scenario where you wouldn't want 
>> this).
>>
>> But rather than having to do a flush_all() on every member of the 
>> cluster when #2 happens, I'd much rather see something like a php.ini 
>> parameter that lets me tell memcache not to rebalance the cluster when 
>> one fails:
>>
>> memcache.rebalance = false
>>
>> I have enough memcache servers that a failure of one of them doesn't 
>> dramatically affect performance.  But having stale data, or having to 
>> flush_all() every server would be a Big Deal.
>>
>> I suppose I could just write a wrapper for memcache in PHP that 
>> handles failure scenarios and not use memcache:addServer() at all if 
>> this doesn't sound feasible.
>>
>> Also, I'd love to get a little insight into exactly what happens when 
>> a failure occurs.  What causes memcache to consider a server to be a 
>> failure?  Is it only if a socket connect fails?  Or does a failure of 
>> some of the commands (delete, for example) also cause a server to be 
>> marked as failed?
>>
>>
>> And finally, I see that there's a retry timer.  Is that global for the 
>> entire Apache process?  Or just a thread/fork?  If I set it to be 60 
>> seconds or something, does that mean there will only be a single retry 
>> every 60 seconds for the entire physical server running Apache?  Or 
>> are all the threads/forks going to retry every 60 seconds?  I want to 
>> make sure we're not retrying so frequently that we're causing it to flap.
>>
>> A little bit better documentation in this regard would help, but 
>> perhaps providing some mechanism where the application can mark a 
>> server in the cluster as failed at will would be nice, too.  And is 
>> there any way to notify (via php's error log, or firing a function or 
>> something) me when a server fails?
>>
>> Thanks,
>>
>> Don