System architecture best practices - memcached + webfarms

Brad Fitzpatrick brad at danga.com
Mon Jul 2 22:16:15 UTC 2007


In the Perl client, at least, your key can contain an explicit
numeric hashing value, which has two benefits: 1) it's quicker to do $int
% $num_servers than do a crc32, and 2) locality... you tend to hit one
server if most the memcache objects you're requesting are, say, from the
same user.

So instead of:  (contrived example:)

   $memc->get_multi("user17:name", "user17:location")

which would do two crc32s, and likely hit two servers, we do:

   $memc->get_multi([17, "user17:name"], [17, "user17:location"])

No crc32s, and they'll both go to the same node.

That's one trick.

But even without that, your web nodes should already have all the TCP
connections open to all the memcaches (or most of them), so then your
get_multi implementation should just do a non-blocking write to all of
them "at once" (actually in serial, but you're not waiting for a reply, so
network latency, so it's pretty much "immediate"), then you select/epoll
on them all at once, reading from them in order.  If your implementation
does a serial get_multi to each one, then the network latency over 100
requests will kill you, and you should fix the client API you're using.

So basically you can avoid hitting a lot of servers usually, but even if
you have to, it's shouldn't be _that_ bad.

- Brad


On Mon, 2 Jul 2007, Richard Jones wrote:

> I've been thinking about what changes we may have to make to our memcached
> installation in future as we continue to grow - our webfarm is approaching
> 100 servers, each with 4GB ram, ~2GB of which is dedicated to memcached on
> each machine.
>
> As we add more nodes, the usefulness of get_multi decreases - it's possible
> for a single page to hit almost all of the memcached instances. I read
> somewhere that facebook partition their memcached cluster to improve
> get_multi performance (eg, all user data on a subset of mc nodes). Can anyone
> comment on the effectiveness of this?
>
> Are we fighting a losing battle here - perhaps we should try and cram as much
> ram as possible into a small number of machines (what do you do?). get_multi
> would be more useful, but it costs more and doesn't seem as elegant :(
>
> Can anyone comment on how many memcache lookups they make per page?
> Traditionally our developers treated memcache lookups as "free", but when
> you're doing a few hundred memcache gets per page it soon adds up..
>
> Thanks,
> RJ
>
>
>
> --
> Richard Jones
> Last.fm Ltd. | http://www.last.fm/
> Office: +44 (0) 207 780 7080
>
>


More information about the memcached mailing list