Asymetric Failure + Best Practices

Mon Jun 23 22:56:10 UTC 2008

Hey Folks,

I’m partially through a .NET web app integration of memcached using the Enyim client bindings published on codeplex and had a few 101 questions about how the protocol behaves during a failure scenario.

Since the protocol necessitates that the client bindings modulate the hash key over available servers, how does this behave during an asymmetrical failure? 

Consider the following example;

I’ve got two app servers AppA and AppB, and two Memcache instances MCA and MCB. Consider the case where the server running MCB suffers a hardware failure and is universally unrouteable. The client bindings on both AppA and AppB should detect this failure and re-jigger the key space to map 100% to MCA. 

Consider an asymmetric failure where only one of the servers (say, AppA) is unable to route MCB. This is substantially less likely the a categorical failure, but could happen for numerous reasons- a misconfigured firewall, a housed router, what have you. In this scenario, AppA makes the determination that MCB has failed and will remap the keyspace to MCA. AppB, meanwhile, is able to route MCB successfully and will continue placing Keys into BOTH MCA and MCB, and the chance for both servers to read stale data is inevitable, since invalidations will consistently map to different nodes.

My question is this, What’s the best practice people have discovered here? Should I have AppA detect the failure and notify all other nodes to ignore MCB? How does that work for intermittent failure?

Thanks in advance for any insight ☺

Jason McGuirk
SupportSoft, Inc.