problem with concurrent changes to a file
Ed W
lists at wildgooses.com
Wed Nov 4 11:30:00 UTC 2009
Kostas Chatzikokolakis wrote:
>> 2) Implement the hash calculation in the IO access functions which
>> supply the source chunk data. In this way the serialisation of read
>> order is implicit and enforced by your IO layer? This also naturally
>> deals with certain kinds of changing data correctly (ie tail append).
>>
>
> The problem is that with gpg enabled, 5 processes run in parallel with
> different chunks, so they seek at the beginning of the corresponding
> chunk and start reading from there. It's not a serial read.
>
Does GPG access the disk itself or is it feed via a pipe?
You could arrange for a "reader" to handle all the seeks/reads -
obviously as something requests a random access read you need to read
all the data in from the beginning of the file while computing the
rolling hash function, the earlier data either needs to be stored in
memory or spooled to temp storage for later use... Neither is ideal
However, I don't really see the requirement for such an algorithm (at
least from a position standing on the sidelines). A digest of each
chunk needs to be assumed to be a valid way to verify the data of that
chunk. It should be possible to verify a file by breaking it back into
the original chunks and verifying each chunk matches the source digest
per chunk. As you point out a digest of a bunch of digests should also
be acceptable (although the cryptographers no doubt have problems with
this), and of course this is much less of a convenient number to work
with because computing it for an arbitary file is a bit complicated and
requires figuring out how it was backed up (chunk sizes, etc) in order
to compute the digest
I think some sort of digest algorithm which can be computed in parallel
is the ideal option. I'm not sure if anything actually exists which is
suitable for our purposes though? A quick google search didn't turn up
anything better than plain old CRC32
Failing that I don't see how you can do any better than "snapshotting"
the source file, either by implementing a single thread reader which
perhaps spools temp data to disk, or by simply copying the file
somewhere before you operate on it, or optionally making use of
filesystem features to snapshot the file (XFS, etc)?
Ed W
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/brackup/attachments/20091104/333256ab/attachment.htm
More information about the brackup
mailing list