libmemcache(3) 1.1.0rc4...

Wed Dec 22 11:11:41 PST 2004

On 21.12., Sean Chittenden wrote:
> Yeah... this is a gapping hole in the API at the moment.  There are two 
> potential solutions to this.  The first being:
> 	*) Move the server lists to shared memory
> 	*) Provide a tool that administrates the lists in shared memory.

In my experience using mmap(2)ed files is easier than dealing with SYSV
shared memory.

> This would work fine and dandy, but is a less than perfect solution as 
> it requires manual management of server lists.  The ideal/correct 
> solution is spiffy neato and makes all of this kinda foo automagic.  

Writing a small daemon process that a reads a list of servers from a
configuration file, tries to connect to each server, issues a couple of
commands and modifies the list of active servers stored in shared memory
according to the results, should be quite easy.  That's the same method
every load balancer uses for availabity checks (i.e. doing out-of band
checks).

Another thing: I think the way the distribution to the servers is done
should be changed. Currently you are doing a "hash % number of live
servers". You should do a "hash % number of configured servers". If this
hits a server which is down rehash the request (e.g. append "foo" to the key
and calculate the hash) and loop, unless you find a server which is alive
(of course the "all servers down case" should be treated specially,
returning an error without any hashing at all). Otherwise you lose almost
the whole cache due to distribution changes after a server goes down. With
rehashing only the contents of the server that went down is lost.

That tricky part is how to avoid losing the contents again if the server
comes back alive. May be it would be possible to include a flag "this server
A was offline during the last 24 hours". If a key isn't found on this server
A, then the client should try to fetch the data from the server X which was
the server to be used when server A was down.

Can you follow me? :)

Another nice idea would be a local memcache-proxy on each client machine.
The application talks to this proxy instead of communicating directly to the
memcached servers. The proxy does the distribution to the different
memcached instances. Write requests get a special treatment: Each is issued
to two memcached instances. If one memcached goes down, the proxy knows the
corresponding backup instance and can request data from this memcached.

For example if you have 8 memcacheds you have the following
primary->secondary groups: 1->2, 2->1, 3->4, 4->3, 5->6, 6->5, 7->8, 8->7.
If 3 goes down, all write requests which are assigned to 3 are rehashed and
subsequently distributed to one of 1, 2, 4, 5, 6, 7. Read requests where the
hash resolves to number 3 are also rehashed and distributed to one of 1, 2,
4, 5, 6, 7. If the key is not found another attempt is made to read from 4
which is the backup instance of 3.

If 3 comes back alive everything gets a little bit more complicated. Write
requests are done only to memcached 3, like everything was fine. Failed read
requests are rehashed and distributed to one of 1, 2, 4, 5, 6, 7. Now we
have two cases: If the downtime of 3 was long enough that all data on 4,
which was mirrored from 3 before the crash, has expired, then nothing else
has to be done. Otherwise another read attempt should be issued to 4 if the
key wasn't found on the servers queried before.

I don't know how to get the atomic increase/decrease right, however ... And
may be there are problems with the whole idea which haven't occurred to 
me yet? :)

I know that reliability isn't the no. 1 goal of memcached. But using it as 
quick and easy session storage is tempting. MySQL cluster brings licence
issues (commercial licence needed in many cases) and has problems on its own
(when I tested a four node setup with one node crashing I couldn't bring it
alive due to "Unable to alloc node id" errors. Somehow the old connection
was "stuck". The only solution was to restart the management server ...).

Sven

PS Sorry for my bad English, I hope you could guess what I meant to say ...
--
--Sven-Paulus----------28-48-32--3-----sven at karlsruhe.org------------
--Karlstr.-55----------30--2-50-29-----http://www.nntp.de------------
--D-76133-Karlsruhe----44-49-10--8-----T#-49-721-9375094---------irc-
--PGP-ID-3C9A6091-------9-12-19-71-----F#-49-721-9375095-----svennie-