Transparent failover and restore?
Gregory Block
gblock at ctoforaday.com
Fri Dec 17 09:45:27 PST 2004
(thinking out loud, here)
In theory, you could probably achieve this in a pretty straightforward
manner...
1) On put, step through the chain of selected servers; if one server is
marked as down, try it until you go all the way through the list. You
essentially need to "step" through the servers, removing one from the
chain each time you find it's down, until you find the appropriate one
to set; this ensures that when they come back online, they maintain
their ordering. Once you find a server that's up, put to all of the
servers to a depth of "n", where 'n' is the amount of replication you
wish to achieve.
2) On get, step through the chain of servers; if one server is marked
as down, or reports it doesn't have the content, try it until you go
all the way through the list. When you locate one in the chain that
has the content, update the previous links in the chain with that
content.
Assuming you built this in a relatively parallelized fashion (on get,
issue gets to the 'predicted' n servers in parallel; push the updates
back on parallel threads asynchronously and return the gotten results;
on put, parallel-put the update and sync on return of all puts), you
should get transparency of failure, with no specifically designated
replica (the replica itself is distributed to the depth of 'n').
Now, that probably means...
- Changing the way the client deals with the list of servers, to be
able to pull up the n servers in the order that it would attempt to
retrieve from them
- Introducing the parallelism to do asynchronous gets (which could, of
course, be unnecessary if you're willing to live with the time delay of
negative responses forcing you to hop n deep)
- Introducing the parallelism on puts (unnecessary if you're willing
to live with the cost of a put being the cost of putting to n servers).
Buys you the same thing, really, and stays a client-side problem.
'course, it affects stuff like delete too, but that's relatively easily
solvable.
On 17 Dec 2004, at 06:03, Kevin A. Burton wrote:
> If you have nodes in your cluster that contain around 2G of data it
> could take a LONG time and provide unacceptable delay in your
> application should it fail.
>
> Are there any plans to have redundant and load balanced memcached
> nodes that can restore from their clone when they come back online?
>
> Kevin
>
> --
>
> Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an
> invite! Also see irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
> If you're interested in RSS, Weblogs, Social Networking, etc... then
> you should work for Rojo! If you recommend someone and we hire them
> you'll get a free iPod!
> Kevin A. Burton, Location - San Francisco, CA
> AIM/YIM - sfburtonator, Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
More information about the memcached
mailing list