problem with concurrent changes to a file
Ed W
lists at wildgooses.com
Tue Nov 3 23:45:28 UTC 2009
Hi
> The problem is that the full_digest of each file, stored in the .brackup
> file, is computed directly from the file on disk. As a result, it might
> be different than the chunks (that are separately loaded from the disk)
> if the file is modified in the meanwhile. So the restored file has
> different digest than the one stored in the .brackup file, leading in
> the above error.
>
I don't know the code, but two ideas spring to mind:
1) Switch to using some kind of CRC/Hash which can be computed in
parallel. Always bad to assume otherwise, but tentatively I would
suggest we don't need cryptographic quality hashes, the goal was to
detect corruption... This could include storing the hashes for each
chunk, rather than the hash for the whole file..
2) Implement the hash calculation in the IO access functions which
supply the source chunk data. In this way the serialisation of read
order is implicit and enforced by your IO layer? This also naturally
deals with certain kinds of changing data correctly (ie tail append). I
think this is acceptable in the sense that snapshotting a file requires
cooperation from the filesystem to be done right, eg LVM/XFS/ZFS
snapshots or similar. At least this way though the digest stored does
actually match the digest of the data uploaded - this seems like the
main criteria that the digest is supposed to be guaranteeing?
Cheers
Ed W
More information about the brackup
mailing list