problem with concurrent changes to a file

Kostas Chatzikokolakis kostas at chatzi.org
Tue Nov 3 21:47:28 UTC 2009


Hello,

during the last week I've been testing the low memory usage patch that I
sent in my previous mail. I discovered that if a file is modified while
it is backed up, restore might fail with the following message

> restore failed: Digest of restored file (...) doesn't match at
> /home/vagabond/projects/brackup/svn/lib/Brackup/Restore.pm line 228.

The problem is that the full_digest of each file, stored in the .brackup
file, is computed directly from the file on disk. As a result, it might
be different than the chunks (that are separately loaded from the disk)
if the file is modified in the meanwhile. So the restored file has
different digest than the one stored in the .brackup file, leading in
the above error.

I wrote the attached test to reproduce it. It tries to backup a file
that is being constantly modified by a forked sub-process. The test
fails in the current svn version, not just my patched version.

Unfortunately it is hard to fix this: in principle the full_digest
should be computed from the same chunk data that are stored on the
target. But this is not easy because the chunks are loaded in parallel
and by the child processes that do the encryption. This would involve
loading the chunks in the main process and serialize them to compute the
full_digest.

On the other hand, it has very little value to compute the digest of a
file that is being modified, and check it on restore. With a high
probability the backup will be corrupted anyway (it won't correspond to
a point in time). So, IMHO, brackup should do the following: during
backup it should detect that a file has been modified since its
processing started. In this case it should display a warning to the
user, to let him know that he can't trust the backup for that file. Then
it should store full_digest=0000000... in the .brackup file, cause the
full_digest is invalid anyway. Then brackup-restore won't check the
digest for such files (and under -v it can display some message).

I was thinking to implement this in my patch, but I wanted to ask your
opinion (hoping that my patch will eventually be merged).

Cheers,
Kostas


PS1. My patch also has another issue with concurrent updates, causing
the raw digest of a chunk to be differnt than the key in the inventory
db. This is fixable, I've written a test that catches it and I'm working
on a fix.

PS2. Apart from these concurrency issues, the patch seems to work fine,
finishing the backup in all cases when I was previously getting out of
memory errors.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 05-concurrent-changes.t
Type: text/troff
Size: 2904 bytes
Desc: not available
Url : http://lists.danga.com/pipermail/brackup/attachments/20091103/475d9d95/05-concurrent-changes.bin 


More information about the brackup mailing list