mogadm reports entire host as failed after two drive failures
Jared Klett
jared at blip.tv
Wed Feb 14 20:38:44 UTC 2007
hello mogile folks,
I've been populating my shiny new MogileFS with files for the
last few days, and today I had two drives fail in one of the storage
hosts. bummer! (but fortuitous before going into production)
after the first failure this morning, 'mogadm check' reported as
I would expect:
Checking trackers...
10.0.0.206:6001 ... OK
Checking hosts...
[ 1] file3 ... OK
[ 2] file4 ... OK
Checking devices...
host device size(G) used(G) free(G) use%
---- --------------- ---------- ---------- ---------- ------
[ 1] dev1 372.506 139.920 232.585 37.56%
[ 1] dev2 372.506 140.584 231.922 37.74%
[ 1] dev3 372.506 137.208 235.297 36.83%
[ 1] dev4 372.506 137.890 234.616 37.02%
[ 1] dev5 372.506 138.765 233.741 37.25%
[ 1] dev6 372.506 143.041 229.465 38.40%
[ 1] dev7 372.506 142.663 229.842 38.30%
[ 1] dev8 372.506 141.766 230.739 38.06%
[ 1] dev9 372.506 142.891 229.614 38.36%
[ 1] dev10 372.506 143.516 228.989 38.53%
[ 2] dev11 372.506 143.359 229.147 38.48%
[ 2] dev12 372.506 142.172 230.333 38.17%
[ 2] dev13 372.506 144.171 228.335 38.70%
[ 2] dev14 372.506 145.114 227.392 38.96%
[ 2] dev15 372.506 141.289 231.217 37.93%
[ 2] dev16 372.506 140.712 231.794 37.77%
[ 2] dev17 372.506 139.755 232.751 37.52%
[ 2] dev18 372.506 140.640 231.866 37.76%
[ 2] dev19 372.506 142.262 230.243 38.19%
[ 2] dev20 REQUEST FAILURE
---- --------------- ---------- ---------- ---------- ------
total: 7077.606 2687.718 4389.888 37.97%
but after the second failure, it reports the entire host as
failed:
Checking trackers...
10.0.0.206:6001 ... OK
Checking hosts...
[ 1] file3 ... OK
[ 2] file4 ... REQUEST FAILURE
Checking devices...
host device size(G) used(G) free(G) use%
---- --------------- ---------- ---------- ---------- ------
[ 1] dev1 372.506 140.997 231.508 37.85%
[ 1] dev2 372.506 141.694 230.812 38.04%
[ 1] dev3 372.506 138.347 234.158 37.14%
[ 1] dev4 372.506 138.629 233.877 37.22%
[ 1] dev5 372.506 140.295 232.211 37.66%
[ 1] dev6 372.506 144.190 228.316 38.71%
[ 1] dev7 372.506 144.503 228.003 38.79%
[ 1] dev8 372.506 143.518 228.988 38.53%
[ 1] dev9 372.506 144.285 228.221 38.73%
[ 1] dev10 372.506 144.897 227.608 38.90%
---- --------------- ---------- ---------- ---------- ------
total: 3725.056 1421.355 2303.701 38.16%
which is odd, since I'm on file4 right now and it's working
fine. if I run other commands through mogadm, it reports as expected:
# mogadm --trackers=10.0.0.206:6001 host list
file3 [1]: alive
IP: 10.0.0.217:7500
file4 [2]: alive
IP: 10.0.0.218:7500
I marked the two devices as dead earlier, which shows up here:
# mogadm --trackers=10.0.0.206:6001 device list
file3 [1]: alive
used(G) free(G) total(G)
dev1: alive 140.997 231.508 372.505
dev10: alive 144.894 227.611 372.505
dev2: alive 141.693 230.812 372.505
dev3: alive 138.347 234.158 372.505
dev4: alive 138.628 233.877 372.505
dev5: alive 140.356 232.148 372.505
dev6: alive 144.189 228.315 372.505
dev7: alive 144.502 228.003 372.505
dev8: alive 143.518 228.987 372.505
dev9: alive 144.284 228.221 372.505
file4 [2]: alive
used(G) free(G) total(G)
dev11: alive 145.280 227.225 372.505
dev12: alive 143.927 228.578 372.505
dev13: alive 145.359 227.146 372.505
dev14: dead 146.562 225.943 372.505
dev15: alive 142.580 229.925 372.505
dev16: alive 142.285 230.220 372.505
dev17: alive 141.778 230.727 372.505
dev18: alive 141.899 230.605 372.505
dev19: alive 143.648 228.856 372.505
dev20: dead 131.006 241.499 372.505
is this behavior from 'mogadm check' correct, or is there an
issue?
cheers,
- Jared
--
Jared Klett
Co-founder, Blip.tv
646.526.8948 (cell)
JaredAtWrok (aim)
http://blog.blip.tv
More information about the mogilefs
mailing list