Which way is better for running memcached?

Steven Grimm sgrimm at facebook.com
Sat Feb 17 00:56:03 UTC 2007


I'm not sure I understand the basis of the questions. There's no need to 
"deal with" data spread across multiple machines -- in fact, the beauty 
of memcached is that it *encourages* data to spread across multiple 
machines, which serves as a pretty good load balancing mechanism.

If you call the "get keys A, B, and C" function/method in any of the 
client libraries I'm familiar with, and keys A and C map to server 1 
while key B maps to server 2, they'll all send "get A C" to server 1 and 
"get B" to server 2, then wait for both servers to respond and return 
the combined result to the application.

Our application then looks up any non-cached data in the database, as 
you surmise, and stores it in the cache. But that logic is independent 
of figuring out which memcached servers to talk to -- it would work 
exactly the same with one huge instance or with a thousand little ones.

If you can explain what problem you're running into that leads to those 
questions, maybe I'll be able to give you a more meaningful answer -- I 
don't see the context in which those things would be of concern. I'm 
especially unclear on the second question about data slices; what do you 
mean by that?

-Steve


Don MacAskill wrote:
>
> Is your data perfectly divided so that every multi-get never touches 
> more than one instance?
>
> If so, you just make sure somehow that your data slice never crosses 
> 32GB or whatever your typical memcached instance has?
>
> If not, how to you deal with data spread across multiple memcached 
> instances?
>
> Do you do a multi-get on one instance, see what's missing and issues 
> single gets for the remaining data, falling back to some other 
> disk-based store on those failures?
>
> Or issue multi-gets to each memcached instance and combine, then go to 
> disk for non-cached requests?
>
> Or something else entirely?
>
> Thanks,
>
> Don
>
>
> Steven Grimm wrote:
>> We use it a lot. We divide the data for a given page into "stuff we 
>> need immediately for the business logic that will change what other 
>> data we need to fetch," "stuff we need for the business logic that we 
>> can evaluate in isolation," and "stuff we're going to display." The 
>> first gets fetched as needed during the execution of the page. The 
>> second and third, we queue up internally and request all in one big 
>> "get" just before rendering the page at the end of the request; for 
>> the second class of data, we have a callback mechanism wrapped around 
>> the memcached client so that we can run our business logic using some 
>> of the returned data. There are some additional wrinkles but that's 
>> the rough idea.
>>
>> By the way, it's not really any easier or harder in PHP than in any 
>> other language; it's about application structure, not language. If we 
>> were writing our site in Java or Python or C/C++ we'd probably do 
>> exactly the same thing.
>>
>> -Steve



More information about the memcached mailing list