Memory + GPG
robb at canfield.com
Wed Aug 29 03:48:25 UTC 2007
I just finished some tests to verify my assumptions on Brackup memory
consumption. The following are added together with the total RAM
consumption varying based on the current chunk size being worked with:
* Array of files to backup (plus file stats)
* Minimum of chunk_size of file size
* Size of GPG chunk data (often much smaller than the file size)
* Overhead of digest per file (use to be ALL files)
So for a chunk size of 64MB (default) and a file that matches that
consumption of RAM will zoom to 64MB + GPG size, assuming worst case of
a random file then size would be 128 MB for a very short while. Then the
size will be reduce to the size of the GPG chunk, or if GPG is not being
used the size of the file chunk, during the time it takes to write the
file to the target device.
If you are running out of RAM the easiest thing to do is adjust your
chunk_size to something smaller. Of course this will cause any file
larger than this chunk size to be backed up again!
I have some experimental code that removes this per-chunk and might even
speed things up a tad. But it needs integration into existing Target
types and some more testing.
Richard Edward Horner wrote:
> Well, yeah, a bigger problem though is in VPS implementations. Most
> seem to be designed to be sold, not to work well. They all seem to
> dynamically scale CPU but not RAM. It's like you're stuck with
> whatever amount of RAM. I know that this is in part because of the
> kernel design but I recall some patches being submitted recently that
> allow for dynamic scaling of RAM in the kernel.
> Later, Rich(ard)
> On 8/29/07, robb <robb at canfield.com> wrote:
>> Agreed and done, dry-run effectively disables GPG
>> I am not sure how much memory leaking was occurring with GPG for
>> multi-gig runs. I may try to test that. But otherwise I found leaking to
>> be in in the 20-30 MB range and that's not enough to explain David's
>> issue. But I have not yet torn into S3 processing.
>> One thing that could be a problem is that Brackup retains the chunk in
>> RAM! For the default of 64MB that adds up to a whole lot of RAM usage
>> VERY quickly. From what I can tell the total would never exceed 2x (so
>> 128MB) since encrypted chunks are not read from disk until needed. But
>> still, 128MB is a LOT of RAM. The easiest way to handle this is setting
>> the chunk size to 5MB or so. Changing the way chunks are handled is
>> tricky but I will probably look at it when I try to add alternate
>> encryption/compression filters later this week (as time allows).
>> Another RAM issue is that the file list is built THEN processed. While
>> it's nice to know a completion estimate it does chew up a lot of RAM to
>> pre-build the file list. Perhaps an option to estimate versus a
>> scan/backup combination is in order. That would save another 20-40 MB
>> for large backup sets. In addition the pre-scan has some issues with
>> files missing, permissions/ownership changes between scan and backup,
>> etc. None of these are huge for small sets but for 100 GB and 30,000+
>> files it becomes a bit of a problem.
>> My main issue with RAM is that Brackup is destined for a number of VPS
>> systems I have (local and remote). These systems have optimized RAM
>> usage and I need something that is as small as reasonable (as my time
>> allows) measurable and predictable.
>> But with all the changes I have done I will need to remeasure RAM
>> performance from scratch some day. I know I am using less of it than
>> prior version of my code but have not done a end-to-end comparison yet.
>> Richard Edward Horner wrote:
>>> Awesome work.
>>> I had actually suspected there might have been an issue with this when
>>> David posted his problem, hence my asking if he was using GPG.
>>> On the --dry-run issue, I think ppl expect it to tell you what would
>>> be done but not do things. If you read the man page for many ppl's
>>> favorite package manager, apt-get, which I think would be a good point
>>> for establishing expected behavior, it says:
>>> No action; perform a simulation of events that would occur but do not
>>> actually change the system.
>>> "No action" is pretty clear but "perform a simulation" isn't exactly
>>> the same as "no action". Sorry, I come from a family of lawyers.
>>> Usually when you do a dry run on something, you just want to quickly
>>> see what it would do, so not invoking GPG would be beneficial cuz it
>>> would be faster and also if you're invoking --dry-run cuz you're
>>> trying to back up a failing disk and you're not sure how many more
>>> read/writes you're gonna get, having it do anything is not good. I
>>> would be inclined to say have it really do nothing more than print
>>> messages to the console. If you want further action, there can be
>>> another flag.
>>> Thanks, Rich(ard)
>>> On 8/29/07, robb <robb at canfield.com> wrote:
>>>> I had to tear into the GPG processing to locate some temp file
>>>> anomalies. I found that temporary files are not always cleaned when GPG
>>>> is active, and can accumulate at an alarming rate on large backups.
>>>> That's fixed along with some memory leaking (minor) problems.
>>>> Added improved recover code for backups so that an error backing up a
>>>> file/chunk no longer aborts the process. Set via the option --onerror
>>>> (default is to halt code).
>>>> An error log is now maintained for the run so that if
>>>> '--onerror=continue' is given there is still a place to examine for
>>>> errors. The name is based on the brackup metafile name with '.err'
>>>> suffixed. Logs are maintained for dry-runs as well and they are suffixed
>>>> with '-dry.err'.
>>>> I also found that --dry-run does NOT disable GPG. So the GPG process
>>>> manager will happily burn CPU and disk creating files that are then
>>>> deleted (via the new clean up code). Should GPG be disabled during dry
>>>> runs? I suppose testing GPG on every file to see if it works or not
>>>> might be useful, but that seems excessive.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3237 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.danga.com/pipermail/brackup/attachments/20070828/dd3cdf59/smime-0001.bin
More information about the brackup