Which way is better for running memcached?

Sat Feb 17 01:36:36 UTC 2007

You know, right after I sent this, I realized I was smoking crack.  :)

We don't currently use multi-get, and one of the reasons is because the 
first libary we used (3 years ago?) didn't support multi-get across 
multiple servers.

Only after I hit send did I realize that that ancient library didn't 
even support multiple servers, period, which was the whole reason we 
re-wrote the library ourselves.  :)

It makes perfect sense that you could shove a bunch of keys at your 
memcache libary and it would use the hashed values of each key to create 
multiple multi-get requests, one for each instance.  Duh.

We're not having any performance problems, so I guess I just haven't 
revisited it yet.  Sounds like a relatively easy reason to gain some 
speed, though, so I probably should.

Sorry for wasting the list's time.  :)

Don

Steven Grimm wrote:
> I'm not sure I understand the basis of the questions. There's no need to 
> "deal with" data spread across multiple machines -- in fact, the beauty 
> of memcached is that it *encourages* data to spread across multiple 
> machines, which serves as a pretty good load balancing mechanism.
> 
> If you call the "get keys A, B, and C" function/method in any of the 
> client libraries I'm familiar with, and keys A and C map to server 1 
> while key B maps to server 2, they'll all send "get A C" to server 1 and 
> "get B" to server 2, then wait for both servers to respond and return 
> the combined result to the application.
> 
> Our application then looks up any non-cached data in the database, as 
> you surmise, and stores it in the cache. But that logic is independent 
> of figuring out which memcached servers to talk to -- it would work 
> exactly the same with one huge instance or with a thousand little ones.
> 
> If you can explain what problem you're running into that leads to those 
> questions, maybe I'll be able to give you a more meaningful answer -- I 
> don't see the context in which those things would be of concern. I'm 
> especially unclear on the second question about data slices; what do you 
> mean by that?
> 
> -Steve
> 
> 
> Don MacAskill wrote:
>>
>> Is your data perfectly divided so that every multi-get never touches 
>> more than one instance?
>>
>> If so, you just make sure somehow that your data slice never crosses 
>> 32GB or whatever your typical memcached instance has?
>>
>> If not, how to you deal with data spread across multiple memcached 
>> instances?
>>
>> Do you do a multi-get on one instance, see what's missing and issues 
>> single gets for the remaining data, falling back to some other 
>> disk-based store on those failures?
>>
>> Or issue multi-gets to each memcached instance and combine, then go to 
>> disk for non-cached requests?
>>
>> Or something else entirely?
>>
>> Thanks,
>>
>> Don
>>
>>
>> Steven Grimm wrote:
>>> We use it a lot. We divide the data for a given page into "stuff we 
>>> need immediately for the business logic that will change what other 
>>> data we need to fetch," "stuff we need for the business logic that we 
>>> can evaluate in isolation," and "stuff we're going to display." The 
>>> first gets fetched as needed during the execution of the page. The 
>>> second and third, we queue up internally and request all in one big 
>>> "get" just before rendering the page at the end of the request; for 
>>> the second class of data, we have a callback mechanism wrapped around 
>>> the memcached client so that we can run our business logic using some 
>>> of the returned data. There are some additional wrinkles but that's 
>>> the rough idea.
>>>
>>> By the way, it's not really any easier or harder in PHP than in any 
>>> other language; it's about application structure, not language. If we 
>>> were writing our site in Java or Python or C/C++ we'd probably do 
>>> exactly the same thing.
>>>
>>> -Steve
> 
>