What expense is there running across servers?

Wed Nov 7 18:01:41 UTC 2007

Hey,

Just to be clear, are you talking about actively storing multiple copies 
of data into a cluster of memcached servers? The daemon itself doesn't 
do this.

The idea of using memcached in a cluster is that it acts as a second 
layer of a giant hash table. So you store/get keys directly from 
individual instances, and they don't know about each other. So there's 
no overhead. Then you rely on the application layer to deal with a cache 
miss (like any other kind of cache).

That's how you can end up with 70 of them without breaking down your 
network. The other main benefit being that the amount of "Active" data 
in your cache can far exceed the amount of ram you have on individual 
machines.

In your case, if it's unreasonable to regenerate a feed on the fly, you 
might still want to toss the results into mysql and cache _that_ with a 
distributed memcached cluster. And/or storing the feeds twice into 
memcached using different keys, that you then 'get' back at random.

-Dormando

Patrick Galbraith wrote:
> There is some question in discussing using memcached at my company, and 
> how best to get the most out of it. One main point is whether or not to 
> run a cluster across machines, even across data centers, vs each server 
> having it's own cache.
> 
> What we will be using memcached for is to serve out json (a json cache) 
> to our grazr widget. Currently, this is being done with mysql, using 
> replication to replicate this cache (it's currently two tables, lookup 
> table, and blob table 1:1). This was all originally done with flat 
> files, each differing across servers. With replication, if one server 
> caches a feed (this caching process is done with expensive mod_perl 
> application code), the feed is stored with a write handle to the master 
> (one master in each data center with web-heads and slave db attached), 
> that cached json representation of the feed is replicated out to all 
> slaves, all slaves now can serve the 'cached' json vs. having to load it 
> with mod_perl code.
> 
> With memcached, the discussion now is whether we have one instance 
> across machines with all cached json being available across the 
> paradigm, or to have separate caches on each box.
> 
> I'm in favor of a single cluster. Whatever small overhead there is to 
> "replicate" the cached json across memcached instances is much smaller 
> than having to possibly re-cache the feed on a box that does have the 
> feed, when if it were one cluster, it could look it up and serve from 
> cache.
> 
> What I'm asking about in this post is how it would be argued from others 
> that a single instance is most optimal solution, and what overhead is 
> there to network this cluster? What are other's experiences out there? 
> Are there any tests that have been done to see how this works?
> 
> I understand the technical explanation and don't need convincing that 
> libevent makes "replication" (or "glue") of the cluster fast. Also, 
> everyone that I know of who uses memcached uses it in a single cluster. 
> Maybe that's a bit simplistic and is monkey-see-monkey-do, but I would 
> suppose people have implemented to this architecture for a good reason!
> 
> What do others think, and what would you all contribute to this discussion?
> 
> Thanks in advance!
> 
> Patrick
>