Suggested structure to backup multiple servers?
lists at wildgooses.com
Sun Apr 13 18:03:11 UTC 2008
Brad Fitzpatrick wrote:
> I recently changed the Filesystem target in svn to do an existence
> check (a size check, actually) on the file-to-be-uploaded before
> actually uploading it (er, "copying it" ... I use sshfs to backup a
> local "filesystem" remotely). I should push that up a layer so all
> targets (S3, etc) get that for free. But that means more round-trips
> in the case in the common case, where the target doesn't actually have
> it, and you're just backing up 1 server to 1 target.
I guess having the equivalent of a read-ahead thread running ahead doing
existence checks would help? Sounds like S3 likes a slightly
multithreaded approach to using it anyway for best speed? I'm going to
be backing up maildirs so I might want to work on this a bit if it's
feasible? There may be an opportunity to reduce the number of "LIST"
calls by batching up some of the tests also?
What about making the name and chunk a function of the CRC? This
probably doesn't leak any significant info if the CRC function is
something powerful like MD5 or better? That way the check is roughly
free? (Perhaps you already do it this way though...? Not really dug
into the code yet..?)
What happens when two backups happen at the same time with the S3
target? Is there any atomic ability with the filecreate function?
Thinking what happens if two systems try to upload the same file at the
"same time" (with significant sized files, there is an increasing
opportunity for race conditions perhaps?)
I will have a look at the changes you made and take it from there -
appreciate any pointers though
More information about the brackup