Suggested structure to backup multiple servers?

Ed W lists at wildgooses.com
Sun Apr 13 18:03:11 UTC 2008



Brad Fitzpatrick wrote:
> I recently changed the Filesystem target in svn to do an existence 
> check (a size check, actually) on the file-to-be-uploaded before 
> actually uploading it (er, "copying it" ... I use sshfs to backup a 
> local "filesystem" remotely).  I should push that up a layer so all 
> targets (S3, etc) get that for free.  But that means more round-trips 
> in the case in the common case, where the target doesn't actually have 
> it, and you're just backing up 1 server to 1 target.

I guess having the equivalent of a read-ahead thread running ahead doing 
existence checks would help?  Sounds like S3 likes a slightly 
multithreaded approach to using it anyway for best speed?  I'm going to 
be backing up maildirs so I might want to work on this a bit if it's 
feasible?  There may be an opportunity to reduce the number of "LIST" 
calls by batching up some of the tests also?

What about making the name and chunk a function of the CRC?  This 
probably doesn't leak any significant info if the CRC function is 
something powerful like MD5 or better?  That way the check is roughly 
free?  (Perhaps you already do it this way though...?  Not really dug 
into the code yet..?)

What happens when two backups happen at the same time with the S3 
target?  Is there any atomic ability with the filecreate function?  
Thinking what happens if two systems try to upload the same file at the 
"same time" (with significant sized files, there is an increasing 
opportunity for race conditions perhaps?)

I will have a look at the changes you made and take it from there - 
appreciate any pointers though

Cheers

Ed W



More information about the brackup mailing list