fsck hangs when a bad fid is encountered
Del Raco
el_draco at yahoo.com
Sat Sep 8 07:10:29 UTC 2007
13 devices (10 readonly, 3 alive), the 10 are readonly
because they're about to be full, waiting on some new
drives to arrive so the devices can be rebalanced.
4 trackers (1 fsck worker on each one).
Didn't see any specific errors, but not sure if I were
looking at all the right places. I manually deleted
fid 1 and 2 from the file table and did a fsck reset
after stopping each time. The output below are from
running fsck for the third time. I waited around 30
minutes after seeing the bad fid 678419 before
stopping fsck.
How long should I wait for SRCH to become GONE? Also,
how do I go about finding how these fids became
orphaned? Any way to prevent this in the future?
Thanks.
========================================
Output from "mogadm fsck status" (it's not currently
running)
Running: No
Status: 669593 / 41601129 (1.61%)
Time: 37m (297 fids/s; 2294m remain)
Check Type: Normal (check policy + files)
[num_NOPA]: 331
[num_SRCH]: 331
========================================
mysql> select fid, evcode, count(*) from fsck_log
group by fid, evcode;
+--------+--------+----------+
| fid | evcode | count(*) |
+--------+--------+----------+
| 1 | NOPA | 119 |
| 1 | SRCH | 119 |
| 2 | NOPA | 147 |
| 2 | SRCH | 147 |
| 678419 | NOPA | 331 |
| 678419 | SRCH | 331 |
+--------+--------+----------+
6 rows in set (0.00 sec)
========================================
mogadm fsck taillog
unixtime event fid devid
1189153502 NOPA 678419 -
1189153502 SRCH 678419 -
1189153508 NOPA 678419 -
1189153508 SRCH 678419 -
1189153513 NOPA 678419 -
1189153513 SRCH 678419 -
1189153519 NOPA 678419 -
1189153519 SRCH 678419 -
1189153524 NOPA 678419 -
1189153524 SRCH 678419 -
1189153530 NOPA 678419 -
1189153530 SRCH 678419 -
1189153535 NOPA 678419 -
1189153535 SRCH 678419 -
1189153541 NOPA 678419 -
1189153541 SRCH 678419 -
1189153546 NOPA 678419 -
1189153546 SRCH 678419 -
1189153551 NOPA 678419 -
1189153551 SRCH 678419 -
========================================
mysql> select min(utime), max(utime) from fsck_log
where fid = 678419;
+------------+------------+
| min(utime) | max(utime) |
+------------+------------+
| 1189151747 | 1189153551 |
+------------+------------+
1 row in set (0.00 sec)
--- dormando <dormando at rydia.net> wrote:
>
> > It appears that whenever a bad fid is encountered
> (ie
> > no corresponding entry in the file_on table), it
> just
> > gets stuck logging more and more SRCH and NOPA
> entries
> > to the fsck_log.
> >
> > If we delete the bad fid from the file table, it
> will
> > continue until it hits another bad fid.
> >
> > Is this normal? We started fsck by doing "mogadm
> fsck
> > start".
> >
>
> Can you provide a subset of your fsck log? How long
> did you wait? How
> many devices do you have? Did you see any specific
> errors?
>
> I'm pretty sure it shouldn't do that :)
>
> -Dormando
>
____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
More information about the mogilefs
mailing list