Replication Oddities

Mon Apr 28 07:21:38 UTC 2008

Dormando,
  Thanks for the quick response.  Answers below...  

-----Original Message-----
From: dormando [mailto:dormando at rydia.net] 
Sent: Sunday, April 27, 2008 10:51 PM
To: Brian Lynch
Cc: mogilefs at lists.danga.com
Subject: Re: Replication Oddities

Hey,

> -          Over 7 million files are exhibiting a policy violation on
the
> current FSCK run (out of 71.8 million total)
> 
> -          Approximately 6.2 million of these files are perfectly
fine,
> but have leftover entries in file_to_replicate (also generating
> policy_no_suggestions messages)

Is the nexttry value for these fids equal to 'ENDOFTIME' or are they
being constantly retried? What do the file and file_on rows look like
for some of these guys? Does devcount agree with the count of rows in
file_on? Are both of their copies on the same host (but different
devices)?

>>> I'm trying to pull additional data on the file_to_replicate entries
for these files.  What I do know for the moment is that each file has
two distinct copies on two different hosts. 

> -          Approximately 450 thousand files are replicated on more
than
> 2 devices (the policy limit)

Can you get a breakdown of how far off they are? Is the devcount == 3
for most of them, or is it higher?

>>> The count varies greatly between 3 and 60 copies.  

> -          Approximately 4.6 thousand files are replicated on 47
devices

You might want to test out or upgrade to the latest trunk. If you ended
up with fewer than two _writable_ hosts (ie; with enough disk space to
take few files), there was a replication bug which would end up putting
a copy on every device on that one writable host.

>>>> It looks like you might be right about.  I figured out that most of
the affected files were from the initial setup of the environment that
originally had two hosts.  The majority of the copies are on the first
host with one file on the second.  This appears to follow from the fact
we took down the second host for the 3-4 hours for a hardware repair.
We've since expanded the environment to the current 7 storage nodes. 

> -          single master/slave database
> 
> -          3 trackers running off the database
> 
> -          7 mogstored nodes with 46 devices each and lighttpd set up
as
> the getport

On a mogadm check, how many hosts are writable? Did you set any to
readonly/etc? What version of mogilefs?

I'd be curious to see why those rows are sticking around in
file_to_replicate. I'm willing to bet you only have a single writable
host. If not, we can work out what it is.

>>>> As of now, all devices are writeable from mogadm check.  We haven't
had any further outages beyond 2 distinct devices.  So, 322 devices are
available.  I'm curious as well about the number of entries in
file_to_replicate table and the long term affects. 

Your best bet would be to even out files / add another host / etc. If
you really wanted to fix those "bumpkus" fids easily you can either
drain each devices down to zero fids one by one... Which will remove one
copy but not create a new one... Or generate a list of all of the domain
/ key combos and re-upload each file. That will create a few file with
proper devcounts and delete the old fid and all of the redundant copies.

>>> Would it be possible to purge portions of the file_to_replicate
table?  I'm currently pulling out known good replications to identify
bogus entries. 

-Dormando