HTTP ?
Gregory Block
gblock at ctoforaday.com
Tue Nov 30 01:44:39 PST 2004
On 30 Nov 2004, at 05:27, Brad Fitzpatrick wrote:
> Local port exhaustion is an annoying problem, but easy to solve: keep
> your connections to memcached servers open a long time.
>
I have to agree, here. We're running this in production, under heavy
use, and the only way socket exhaustion could ever be an issue is
because we would be opening too many connections.
By definition, if you're opening lots of connections, you're wasting
time in connection setup that you could have made performing that
request; so purely from an optimisation standpoint, this would be
sub-optimal.
Moreover, you're *still* going to have the issue of the memcached
servers having different contents; in effect, you're no better off than
either having a local-memcached policy that has a memcached running on
each machine locally (and if you're going to go that far, you might as
well just go with IPC and get it over with; that'll be fun for a
libevent i/f), because you've just elected to store multiple copies of
the same content onto multiple servers in a way that isn't automated by
some kind of in-memcached replication system (which misses the point,
IMO).
Client balancing works; it requires efficient access by clients, and in
this case, that means you. Hold your connections open unless you're
required to close them by some kind of failure, either in protocol or
in software; if the client you're using doesn't support concurrent
access, arbitrate that access via a pooling mechanism that can leave a
specific number of connections via memcached open, and can
transparently scale those connections with load, without incurring
construction/disposal overhead.
Fundamentally, what you've got is an object (client) which is
"expensive" to create (cpu, resources, doesn't matter which). The way
you manage that is through object pooling; don't solve in hardware what
is fundamentally a simple software problem.
IMO.
:)
I honestly can't think of a situation where a load balancer would be a
positive step unless you also build in the replication; otherwise, the
load balancer can't possibly improve the efficiency of the hashing
system for cache selection, but could easily decrease the efficiency of
cache use.
If one was to take a sideways angle around the problem, and front a set
of memcached clients with an HTTP server that spoke to your load
balancers, and ensured that each server maintained a copy of the
contents, then you could surely implement a poor-man's replication
system; but that creates a raft of other issues which are better solved
through native cache synchronization; and moreover, such a thing could
be no less efficiently handled through parallel client connections and
a thin layer over the memcached client API that managed multiple object
pools with a worker thread that pushed updates to each of the cache
APIs on worker threads, leaving your app to handle things
asynchronously.
The *only* value in the standalone central point is to ensure that
during the write, the client could be returned asynchronously, knowing
that until the async transaction is done, a local map of 'unsynced'
key/value pairs could provide cached results which haven't yet been
updated in memcached to ensure atomicity while allowing the async put
to take place.
I played around with that, and it's just not worth it, even in cases of
large object serialization, in our case. :)
The only, the *only* thing I can think of to wish for is a better
performing GzipOutputStream, and that's mostly because I haven't gone
looking for either a faster compression stream in Java, or something
backed by hardware to do that compression, and that's just a
nice-to-have rather than a killer.
- If you've got connection fatigue, hold your connections around.
- If you can't solve that, you'll replace
connection-fatigue-with-memcached with
connection-fatigue-with-loadbalancer.
- Assuming you don't have the latter problem, you've probably got a
solution that could have been applied to fix the former, identical
issue.
- If the cost of losing a cache is too high, distribute the load
across more caches, effectively reducing the impact of a single cache
failure. (And having said this, the uptimes on our caches are
unbelievably long, even under heavy use.)
- If the cost of accessing cache for everything is too high, split
cache into a "long term" cache which is retained in memcached, and a
"short term" cache which, like human short-term memory, is kept within
specific time/usage limits and holds a limited amount of information;
that raises the problem of needing to keep this content in sync, but
we're purposefully talking about small amounts of content here where
content is pushed simultaneously into the short-term and long-term
memory system for later recall/reuse.
- Keep in mind that any service you put in-between your memcached and
your client will increase the amount of latency in accessing content in
long-term memory, forcing you to consider implementing a short-term
memory system anyways.
- If you were watching the latency in the first place, you wouldn't
have the socket fatigue, because you'd have gotten rid of that latency
first. ;)
Sorry. I'm babbling. It's morning, and I haven't quite had my first
coffee yet.
More information about the memcached
mailing list