problem with concurrent changes to a file

Ed W lists at wildgooses.com
Tue Nov 3 23:45:28 UTC 2009


Hi

> The problem is that the full_digest of each file, stored in the .brackup
> file, is computed directly from the file on disk. As a result, it might
> be different than the chunks (that are separately loaded from the disk)
> if the file is modified in the meanwhile. So the restored file has
> different digest than the one stored in the .brackup file, leading in
> the above error.
>   

I don't know the code, but two ideas spring to mind:

1) Switch to using some kind of CRC/Hash which can be computed in 
parallel. Always bad to assume otherwise, but tentatively I would 
suggest we don't need cryptographic quality hashes, the goal was to 
detect corruption...  This could include storing the hashes for each 
chunk, rather than the hash for the whole file..

2) Implement the hash calculation in the IO access functions which 
supply the source chunk data. In this way the serialisation of read 
order is implicit and enforced by your IO layer?  This also naturally 
deals with certain kinds of changing data correctly (ie tail append). I 
think this is acceptable in the sense that snapshotting a file requires 
cooperation from the filesystem to be done right, eg LVM/XFS/ZFS 
snapshots or similar.  At least this way though the digest stored does 
actually match the digest of the data uploaded - this seems like the 
main criteria that the digest is supposed to be guaranteeing?

Cheers

Ed W



More information about the brackup mailing list