fsck hangs when a bad fid is encountered

Del Raco el_draco at yahoo.com
Mon Sep 10 20:13:11 UTC 2007


Strangely enough, I ran fsck over the weekend, and it
was able to transition from SRCH to GONE immediately. 
I tried it on each of my 4 trackers with a "fsck
reset" in between, and each of them was fine.

I noticed that the fsck_log table doesn't get
truncated with a "fsck reset", only the variables in
server_settings get updated.  Is it okay to truncate
the fsck_log table?

Another problem turned up with fsck.  Not sure if it's
the same problem experienced by other folks on the
list with over replication.  I'm getting policy
violations (POVI) for FIDs that has the correct
devcount, and fsck is replicating when it doesn't need
to.

--- dormando <dormando at rydia.net> wrote:

> Were you looking in syslog, and telnet'ing to a
> tracker and running 
> !watch? The latter's the most useful I think.
> Getting an error or two to 
> go on might give me some luck in figuring it out.
> 
> I'm not up to speed yet on the FSCK job (I'll have
> to be next week 
> though), so we'll have to hope one of the more in
> tune folks pipes up here..
> 
> -Dormando
> 
> Del Raco wrote:
> > 13 devices (10 readonly, 3 alive), the 10 are
> readonly
> > because they're about to be full, waiting on some
> new
> > drives to arrive so the devices can be rebalanced.
> > 
> > 4 trackers (1 fsck worker on each one).
> > 
> > Didn't see any specific errors, but not sure if I
> were
> > looking at all the right places.  I manually
> deleted
> > fid 1 and 2 from the file table and did a fsck
> reset
> > after stopping each time.  The output below are
> from
> > running fsck for the third time.  I waited around
> 30
> > minutes after seeing the bad fid 678419 before
> > stopping fsck.  
> > 
> > How long should I wait for SRCH to become GONE? 
> Also,
> > how do I go about finding how these fids became
> > orphaned?  Any way to prevent this in the future?
> > 
> > Thanks.
> > 
> > ========================================
> > 
> > Output from "mogadm fsck status" (it's not
> currently
> > running)
> > 
> >     Running: No
> >      Status: 669593 / 41601129 (1.61%)
> >        Time: 37m (297 fids/s; 2294m remain)
> >  Check Type: Normal (check policy + files)
> > 
> >  [num_NOPA]: 331
> >  [num_SRCH]: 331
> > 
> > ========================================
> > 
> > mysql> select fid, evcode, count(*) from fsck_log
> > group by fid, evcode;
> > +--------+--------+----------+
> > | fid    | evcode | count(*) |
> > +--------+--------+----------+
> > |      1 | NOPA   |      119 | 
> > |      1 | SRCH   |      119 | 
> > |      2 | NOPA   |      147 | 
> > |      2 | SRCH   |      147 | 
> > | 678419 | NOPA   |      331 | 
> > | 678419 | SRCH   |      331 | 
> > +--------+--------+----------+
> > 6 rows in set (0.00 sec)
> > 
> > ========================================
> > 
> >  mogadm fsck taillog
> > unixtime             event           fid     
> devid
> > 1189153502            NOPA        678419         
> -
> > 1189153502            SRCH        678419         
> -
> > 1189153508            NOPA        678419         
> -
> > 1189153508            SRCH        678419         
> -
> > 1189153513            NOPA        678419         
> -
> > 1189153513            SRCH        678419         
> -
> > 1189153519            NOPA        678419         
> -
> > 1189153519            SRCH        678419         
> -
> > 1189153524            NOPA        678419         
> -
> > 1189153524            SRCH        678419         
> -
> > 1189153530            NOPA        678419         
> -
> > 1189153530            SRCH        678419         
> -
> > 1189153535            NOPA        678419         
> -
> > 1189153535            SRCH        678419         
> -
> > 1189153541            NOPA        678419         
> -
> > 1189153541            SRCH        678419         
> -
> > 1189153546            NOPA        678419         
> -
> > 1189153546            SRCH        678419         
> -
> > 1189153551            NOPA        678419         
> -
> > 1189153551            SRCH        678419         
> -
> > 
> > ========================================
> > 
> > mysql> select min(utime), max(utime) from fsck_log
> > where fid = 678419;
> > +------------+------------+
> > | min(utime) | max(utime) |
> > +------------+------------+
> > | 1189151747 | 1189153551 | 
> > +------------+------------+
> > 1 row in set (0.00 sec)
> > 
> 
> 



      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 



More information about the mogilefs mailing list