<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;">FWIW, I just bought one of these, which can hold 32 gigs of ram (16 x 2)<br><br><span><a target="_blank" href="http://www.sun.com/servers/x64/x2200/">http://www.sun.com/servers/x64/x2200/</a></span><br><br>So, that might replace fourteen or fifteen of your two gig boxes. They say the box can support 64 gigs, but I am not sure I believe them.<br><br>Earl<br><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;"><br>----- Original Message ----<br>From: Timo Ewalds <timo@tzc.com><br>To: Brad Fitzpatrick <brad@danga.com><br>Cc: memcached@lists.danga.com; Richard Jones <rj@last.fm><br>Sent: Tuesday, July 3, 2007 11:54:57 AM<br>Subject: Re: System architecture best practices - memcached +
webfarms<br><br><div>One technique that should work and that would make this easier (as you <br>wouldn't have to pass the userid every time), would be to use a <br>consistent hash on the key. (If you don't know what that is, look at <br><a target="_blank" href="http://lists.danga.com/pipermail/memcached/2006-October/002919.html">http://lists.danga.com/pipermail/memcached/2006-October/002919.html</a> for <br>how it can be applied to adding/removing servers nicely). In the example <br>below, "user17" is the same for the two keys, so if you can put them <br>close on the number line, they're likely to go to the same server. I <br>figured the way to do that would be have the top 24 bits be a hash of <br>the prefix (ie "user17"), and the low 8 bits a hash of the suffix (ie <br>"name" or "location"). Assuming all your requests in the get_multi have <br>the same prefix (which I've found to be true in my application), you'll <br>only hit a couple servers. The problem with this
method of course is <br>that it has the potential to be very unbalanced (ie some servers will be <br>hit hard, while others will be empty), since the keys will be grouped <br>much more strongly. I think the balance vs grouping issue can be solved <br>with a bit of experimentation with the hash function and how many bits <br>go to the prefix vs the suffix.<br><br>Timo<br><br>Brad Fitzpatrick wrote:<br>> In the Perl client, at least, your key can contain an explicit<br>> numeric hashing value, which has two benefits: 1) it's quicker to do $int<br>> % $num_servers than do a crc32, and 2) locality... you tend to hit one<br>> server if most the memcache objects you're requesting are, say, from the<br>> same user.<br>><br>> So instead of: (contrived example:)<br>><br>> $memc->get_multi("user17:name", "user17:location")<br>><br>> which would do two crc32s, and likely hit two servers, we
do:<br>><br>> $memc->get_multi([17, "user17:name"], [17, "user17:location"])<br>><br>> No crc32s, and they'll both go to the same node.<br>><br>> That's one trick.<br>><br>> But even without that, your web nodes should already have all the TCP<br>> connections open to all the memcaches (or most of them), so then your<br>> get_multi implementation should just do a non-blocking write to all of<br>> them "at once" (actually in serial, but you're not waiting for a reply, so<br>> network latency, so it's pretty much "immediate"), then you select/epoll<br>> on them all at once, reading from them in order. If your implementation<br>> does a serial get_multi to each one, then the network latency over 100<br>> requests will kill you, and you should fix the client API you're using.<br>><br>> So basically you can avoid hitting a lot of servers usually, but even if<br>> you have to, it's
shouldn't be _that_ bad.<br>><br>> - Brad<br>><br>><br>> On Mon, 2 Jul 2007, Richard Jones wrote:<br>><br>> <br>>> I've been thinking about what changes we may have to make to our memcached<br>>> installation in future as we continue to grow - our webfarm is approaching<br>>> 100 servers, each with 4GB ram, ~2GB of which is dedicated to memcached on<br>>> each machine.<br>>><br>>> As we add more nodes, the usefulness of get_multi decreases - it's possible<br>>> for a single page to hit almost all of the memcached instances. I read<br>>> somewhere that facebook partition their memcached cluster to improve<br>>> get_multi performance (eg, all user data on a subset of mc nodes). Can anyone<br>>> comment on the effectiveness of this?<br>>><br>>> Are we fighting a losing battle here - perhaps we should try and cram as much<br>>> ram as possible into a small number of
machines (what do you do?). get_multi<br>>> would be more useful, but it costs more and doesn't seem as elegant :(<br>>><br>>> Can anyone comment on how many memcache lookups they make per page?<br>>> Traditionally our developers treated memcache lookups as "free", but when<br>>> you're doing a few hundred memcache gets per page it soon adds up..<br>>><br>>> Thanks,<br>>> RJ<br>>><br>>><br>>><br>>> --<br>>> Richard Jones<br>>> Last.fm Ltd. | <a target="_blank" href="http://www.last.fm/">http://www.last.fm/</a><br>>> Office: +44 (0) 207 780 7080<br>>><br>>><br>>> <br>><br>> <br></div></div><br></div></div><br>
<hr size=1>
<a href="http://us.rd.yahoo.com/evt=49678/*http://smallbusiness.yahoo.com/domains/?p=BESTDEAL"> Get your own web address.</a><br> Have a HUGE year through <a href="
http://us.rd.yahoo.com/evt=49678/*http://smallbusiness.yahoo.com/domains/?p=BESTDEAL">Yahoo! Small Business.</a></body></html>