Replication Oddities

dormando dormando at rydia.net
Mon Apr 28 05:50:47 UTC 2008


Hey,

> -          Over 7 million files are exhibiting a policy violation on the
> current FSCK run (out of 71.8 million total)
> 
> -          Approximately 6.2 million of these files are perfectly fine,
> but have leftover entries in file_to_replicate (also generating
> policy_no_suggestions messages)

Is the nexttry value for these fids equal to 'ENDOFTIME' or are they
being constantly retried? What do the file and file_on rows look like
for some of these guys? Does devcount agree with the count of rows in
file_on? Are both of their copies on the same host (but different devices)?

> -          Approximately 450 thousand files are replicated on more than
> 2 devices (the policy limit)

Can you get a breakdown of how far off they are? Is the devcount == 3
for most of them, or is it higher?

> -          Approximately 4.6 thousand files are replicated on 47 devices

You might want to test out or upgrade to the latest trunk. If you ended
up with fewer than two _writable_ hosts (ie; with enough disk space to
take few files), there was a replication bug which would end up putting
a copy on every device on that one writable host.

> -          single master/slave database
> 
> -          3 trackers running off the database
> 
> -          7 mogstored nodes with 46 devices each and lighttpd set up as
> the getport

On a mogadm check, how many hosts are writable? Did you set any to
readonly/etc? What version of mogilefs?

I'd be curious to see why those rows are sticking around in
file_to_replicate. I'm willing to bet you only have a single writable
host. If not, we can work out what it is.

Your best bet would be to even out files / add another host / etc. If
you really wanted to fix those "bumpkus" fids easily you can either
drain each devices down to zero fids one by one... Which will remove one
copy but not create a new one... Or generate a list of all of the domain
/ key combos and re-upload each file. That will create a few file with
proper devcounts and delete the old fid and all of the redundant copies.

-Dormando


More information about the mogilefs mailing list