Replication Oddities

Brian Lynch blynch at
Wed Apr 30 07:36:39 UTC 2008


  There are roughly 58 million entries in file_to_replicate of a total
71 million files. It seems like the Replication Worker is for some
reason not deleting completed rows (though the code path exists).  Note
that only 570K entries in file_to_replicate have failcount > 0. Only 9
entries have a nexttry = ENDOFTIME. 

mysql> select count(*) from file_to_replicate;

| count(*) |
| 58395828 |
1 row in set (2 min 6.26 sec)


-----Original Message-----
From: dormando [mailto:dormando at] 
Sent: Monday, April 28, 2008 12:50 AM
To: Brian Lynch
Cc: mogilefs at
Subject: Re: Replication Oddities

>>>> Would it be possible to purge portions of the file_to_replicate
> table?  I'm currently pulling out known good replications to identify
> bogus entries. 

You should sample rows out of file_to_replicate, see if the nexttry is
set to 2147483647 - and that all of the paths are invalid.

I've never outright removed rows from file_to_replicate, _unless_ I have
verified that the fid is gone, ie:

- Has no matching 'file' entry.
- Has no matching 'file_on' rows (odd bug, haven't fixed yet).
- Has file row, file_on row(s), but all paths are dead. 404's.

If at least one of those conditions are met, the fid can be removed from
file_to_replicate, and you might want to see why they disappeared to
begin with. Otherwise you do not remove the row.

If the nexttry is off in the future but not equal to ENDOFTIME
(2147483647) you can try UPDATE'ing those rows to UNIX_TIMESTAMP() and
see if they get chewed through. If not, you should find out exactly
what's going on. Odds are one of the three conditions listed above has
happened. If otherwise, you should definitely give a best effort in
figuring out what it was.

Yeah, this should be way more automatic. We'll get to it someday, and
also accept patches ;)


More information about the mogilefs mailing list