Optimising use and using multiple memcache servers in a pool

Sat Jan 20 17:49:14 UTC 2007

> Subject: Re: Optimising use and using multiple memcache servers in a

Thanks Jason,

That has cleared up one point (that I didn't know I needed to understand) :)
and made the whole thing more understandable.

I now have a better understanding of the issues with the network but
fortunately we have a private gigabit network between all our servers and in
any case all the database transactions go over the network (so there is no
change there).

The advantage for us is one of RAM based speed and offloading the READs from
the mySQL database and putting them into the MEMCACHE.

My one query is: 

Is there an *overhead* opening up a pool of memcached servers for each
transaction which is significant in comparison to opening a single server.

If (like our mySQL cluster) there was a single point of entry to the "cache"
then the overhead for managing the pool is done at the "server".  

>From what I can see from the limited documentation creating a pool is
something every client does for every transaction (is that correct).

I was slightly worried that the whole pool creation and management element
added a level of overhead that (in my application) was not entirely necessary.

At the moment I have taken the simple (basic) approach which is:

        // check memcache to see if the object is available
        if ( $ds_server != "" ) {
        $memcache_obj = memcache_connect($ds_server, 11211) or $ds_server="";
        if ( $ds_server != "" ) {
        $article_row   = memcache_get($memcache_obj,$id_str);
	  } else {
	    // get the date from mySQL
        }
        }

As I understand this I need to do this for every client and every transaction
that I need to access.

Now if I was using a pool of servers I would have to:

        $memcache_obj = memcache_connect($ds_server1, 11211,1,20) or (do
something to deal with the server being down?); 
	  memcache_add_server($memcache_obj, $ds_server2, 11211,1,20) or (do
something to deal with the server being down?);
	  memcache_add_server($memcache_obj, $ds_server3, 11211,1,80) or (do
something to deal with the server being down?);
	  memcache_add_server($memcache_obj, $ds_server4, 11211,1,80) or (do
something to deal with the server being down?);
	  memcache_add_server($memcache_obj, $ds_server5, 11211,1,80) or (do
something to deal with the server being down?);
	  memcache_add_server($memcache_obj, $ds_server6, 11211,1,10) or (do
something to deal with the server being down?);
	  memcache_add_server($memcache_obj, $ds_server7, 11211,1,10) or (do
something to deal with the server being down?);

	  If all the servers are NOT down then
	          $article_row   = memcache_get($memcache_obj,$id_str);

But at first sight it looks like you need quite a lot of code to create the
pool and manage any servers that are down is this overhead a potential issue?

(do something to deal with the server being down?) probably is something using
memcahce_set_server_params (possibly).

I assume people have been through this issue but I haven't seen any good code
examples floating around for the current pecl-memcache code base.

Am I seeing too many potential problems in this :) or should it just work?

> ------------------------------
> Hi Alan,
> 
> After reading your email, I still sense some confusion. I just wanted to
> throw out some examples to clarify things. I apologize if this is
> unnecessary.

Not unnecessary - very happy to get as much guidance as possible as there is
limited user documentation. 

> Memcache is a distributed cache.  Given three memcache servers and
> objects number 1..9, the objects may be distributed as follows:
> server1: 1, 4, 7
> server2: 2, 5, 8
> server3: 3, 6, 9
> 
> (I don't know how the hashing algorithm works, this is a simplification)
> 
> When client asks for object 1, it fetches it from server1, even if the
> client is server2 or server3.

OK that makes sense and explains the advantages.

> The main reason to run multiple memcache servers is to increase the size
> of your cache to store more data or to ensure that only a portion of the
> cache is lost if a server goes down.

OK again that makes a great deal of sense.

> The main overhead with memcache vs APC would be the network latency.
> 
> If you have 3 memcache servers, then at worst, each client has a tcp
> connection to each server (assuming one thread on each client).

Sure but is there an overhead to creating those three connections each time a
client requests a page from the web server.

> non-memcache workflow is as follows:
> client read: select from db
> client write: update db
> 
> the typical memcache workflow is as follows:
> 
> client read: fetch object from cache if available, else select from db.
> put object in cache if it didn't exist.
>   other clients will fetch the cached copy after it's cached
> 
> client write: fetch object from db/cache, update db, update cache.
> 
> The client write procedure can be a little tricky. Fetching directly
> from the db avoids cache inconsistencies. memcache doesn't have
> transactions, so I'm not sure if some type of "I'm updating the db"
> token might be useful to store, or just have the updater overwrite
> what's in the cache.

For us I don't think this is an issue as the client does not update the
content the editorial tool (which runs on a separate machine) has code to note
a change resets the expiry time to a few seconds so that the next client that
asks for file will get a new one as it will have expired (at least that is the
theory - and the test seem to have worked).

Once again thanks everyone for your comments and thoughts.

Alan Jay
www.digitalspy.co.uk