some more patches

Kostas Chatzikokolakis kostas at chatzi.org
Tue Dec 29 19:21:10 UTC 2009


Hello,

a few small patches you might be interested in. They are all against the
latest trunk and they're independent from the low-memory patch.


http://codereview.appspot.com/183080/show

Avoids downloading a composite chunk multiple times from the target
(pretty bad if you have thousands of small files). Fixes issue #1 on
google code.

There's a very simple solution for that: brackup currently merges only
small files, not the tail of large files (a good decision). As a result,
files containing parts of a composite chunk are all single-chunk files.
So if the files are restored ordered by the dig of their first chunk,
and the last seen chunk is kept in memory, all files of the composite
chunk can be restored without downloading it twice (and without caching
more than one chunk).

As a free gift, we get some speedup even for non-composite chunks, in
case we have many single-chunk identical files (the chunk will be
downloaded only once).


http://codereview.appspot.com/183082/show

brackup-verify-inventory (very handy tool) currently loads every chunk
individually to verify that it exists. This is ok for filesystem targets
(and maybe for ftp,sftp) but for Amazon it takes ages, with a few
hundred thousand files it never finishes. The patch uses a chunk listing
 ($target->chunks) to do the same job, which is much faster because 1000
files are returned for every S3 list request. The speedup is huge for
amazon.

The only difference is that the patched version doesn't verify the chunk
size. This would be possible by modifying $target->chunks to return the
chunk size, but I wanted to keep the patch simple. In any case, if the
chunk is there it's almost impossible that the size or contents will be
wrong.


http://codereview.appspot.com/181096/show

Tiny patch to use the automatic retry feature of Net::Amazon::S3.
brackup does some retrying itself, but not in all cases (for example not
when uploading the metafile). Moreover Net::Amazon::S3 does it at the
http request level (using LWP::UserAgent::Determined). Some commands
(like listing a bucket) involve multiple requests under the hood so
retrying at the brackup level is not possible.


Cheers,
Kostas



More information about the brackup mailing list