Optimising use and using multiple memcache servers in a pool

Mon Jan 22 08:28:25 UTC 2007

> -----Original Message-----
> From: Jason Edgecombe [mailto:jedgecombe at carolina.rr.com]
> Sent: Sunday, January 21, 2007 5:19 PM
> To: Alan Jay
> Cc: memcached at lists.danga.com
> Subject: Re: Optimising use and using multiple memcache servers in a pool
> 
> >
> I think that leaving a server down is the best practice for short-term
> outages. If you flush the cache of the down server before it comes up,
> then you won't have stale data.

Thanks for the comments.

> > The other issue that I wonder about is in an environment where there are a
> > relatively small number of items of data but lots of views.  Using a pool
> > places all the hits onto a single memcahced server rather than
> distributing
> > them around.
> >
> > And again I don't know if this would be a significant issue in our context
> > where although a lot of articles might be looked at in a day the reality
> is
> > that a large proportion of the views will be for a small number of pages.

> I suspect that even a small handful of high-traffic objects will still
> be distributed somewhat evenly over the memcache servers, assuming the
> number of objects is greater than the number of memcache servers.
> Besides, high-traffic objects are what memcache was designed for. :)

Indeed - I suppose wondering about these things and reading back through the
list makes me wonder about the advantages and disadvantages.  In the case of
digitalspy.co.uk we provide entertainment and media news which means that
over, say a week, there are a few hundred articles which get most of the
traffic during that week.  There is also an extremely long tail of 50,000+
articles which are viewed infrequently.  The size of the cache available will
be (I think - and this is one of the tests for this week to see if the
articles are in all the caches) big enough on all the machines to store these
100 or so articles that get most of the traffic.

So my worry (and I will be trying to test for it), though any comments people
have would be gratefully received, is does having a separate server on each
machine for these small number of highly trafficked items enhance the
performance over a single large cache.

I suppose another question would be if you set the "expiry time" differently
on the peak and tail elements if you can ensure that the peak ones are more
likely to be in the cache than the tail - which would expire in preference.

I wonder if taking this approach has advantages over the overhead of creating
and managing a large pool of servers.  It is curious that the pool has to be
created and managed for each transaction I suppose a development down the road
might be to have the pool managed on each of the servers if this whole thing
is an issue (it might not be).

Anyway thanks for the help and comments.

> > I think this next week or so is going to need a little thinking about the
> > implementation as there seem to be lots of ways this could be coded with
> > different implications.  I'm even thinking that one might be best served
> > running a number of copies of "memcached" on each server some as a pool to
> > provide depth and size for the "long tail" and a smaller cache on each
> server
> > for small elements and current articles that are looked at most in any one
> day
> > and where distributing the calls across the servers is advantageous.

> I think that may be more complex than you need. If you have different
> types of data like user data vs articles, then running multiple pools
> might make sense.

I think you might be right.  But then the simplest approach 1 server per
machine seems to be even simpler.  Anyway lots to testing over the coming
weeks

> In the case of similar data like articles, just use one pool. Set an
> expiration time on each object and do some testing to size the pool
> appropriately. Let memcache manage the heavily used objects. By design,
> lesser used objects will fall out of the cache. Heavily used objects
> will stay in the cache more.

OK - so an object that is used more frequently will not be thrown away when
new data is required/requested it will throw away data that is used least
first (is that correct).

> If you have single caches on each server for lesser used objects, that's
> duplicated storage that could be used for more frequently used objects.

Indeed - I suspect the simplest thing is to do some tests to see how the cache
works in its current state and the state of the high utilization objects
within the cache.  All our machines have lots of memory and the size of the
data objects I think is not an issue.  Having looked at the cache stats I'm
still slightly bewildered as to how you can see how much of the cache you are
using.  

My thinking was the reverse of the above that the heavily used objects from
the last 10 days would be duplicated on local servers while if need be the
long tail could be spread across a pool.  

But I think it will be easier to do some test to see how this works in
practice while I learn more about implementing a pool properly and the actual
overhead that is required to manage it.

> If you want lesser-used objects to be cached less, then you could give
> them a shorter expiration time than heavily used objects, but even
> without that, frequently used object should push them out of the cache
> or they will expire.

OK excellent that was my hope.

> Start with one big pool for all objects. Do some testing to see if you
> have enough cache to save your frequently used objects and adjust the
> size accordingly.

Sounds like a sensible approach.

> > All I can say is it is great to have all the options and fantastic that
> this
> > seems to work as well as it does.
> >
> > Thanks for all the input and comments.
> >
> > Regards
> > Alan
> > www.digtialspy.co.uk
> >
> 
> It's nice to have options, but it can also be bewildering.

Indeed :)

> Glad to help.

Thanks once again for the comments and tips.
ALan

> Jason