Analysis of the over-replication issue
info at bouncetek.com
info at bouncetek.com
Tue Jul 24 12:38:56 UTC 2007
I'm also experiencing excessive replication on a large scale. I've got a mogilefs setup running on 5
hosts with about 49 devices. There are 4 file classes. 3 of them with a mindevcount of 3 and 1 with
a mindevcount of 2. However a huge number of files get replicated to each and every disk.
+------+---------+----------+-----------------+
| dmid | classid | devcount | COUNT(devcount) |
+------+---------+----------+-----------------+
| 1 | 1 | 1 | 274585 |
| 1 | 1 | 3 | 7454296 |
| 1 | 1 | 4 | 45505 |
| 1 | 1 | 5 | 18600 |
| 1 | 1 | 6 | 10868 |
| 1 | 1 | 7 | 8381 |
| 1 | 1 | 8 | 7414 |
| 1 | 1 | 9 | 6364 |
| 1 | 1 | 10 | 4693 |
| 1 | 1 | 11 | 4066 |
| 1 | 1 | 12 | 3674 |
| 1 | 1 | 13 | 2855 |
| 1 | 1 | 14 | 2975 |
| 1 | 1 | 15 | 3078 |
| 1 | 1 | 16 | 4484 |
| 1 | 1 | 17 | 4453 |
| 1 | 1 | 18 | 11250 |
| 1 | 1 | 19 | 642 |
| 1 | 1 | 20 | 449 |
| 1 | 1 | 21 | 382 |
| 1 | 1 | 22 | 591 |
| 1 | 1 | 23 | 452 |
| 1 | 1 | 24 | 327 |
| 1 | 1 | 25 | 354 |
| 1 | 1 | 26 | 363 |
| 1 | 1 | 27 | 423 |
| 1 | 1 | 28 | 555 |
| 1 | 1 | 29 | 507 |
| 1 | 1 | 30 | 1293 |
| 1 | 1 | 31 | 1003 |
| 1 | 1 | 32 | 85 |
| 1 | 1 | 33 | 85 |
| 1 | 1 | 34 | 67 |
| 1 | 1 | 35 | 91 |
| 1 | 1 | 36 | 70 |
| 1 | 1 | 37 | 98 |
| 1 | 1 | 38 | 145 |
| 1 | 1 | 39 | 175 |
| 1 | 1 | 40 | 360 |
| 1 | 1 | 41 | 657 |
| 1 | 1 | 42 | 1365 |
| 1 | 1 | 43 | 3818 |
| 1 | 1 | 44 | 17986 |
| 1 | 2 | 0 | 1 |
| 1 | 2 | 3 | 2182785 |
| 1 | 2 | 4 | 12143 |
| 1 | 2 | 5 | 9306 |
| 1 | 2 | 6 | 9157 |
| 1 | 2 | 7 | 8194 |
| 1 | 2 | 8 | 6869 |
| 1 | 2 | 9 | 7855 |
| 1 | 2 | 10 | 6130 |
| 1 | 2 | 11 | 6198 |
| 1 | 2 | 12 | 5320 |
| 1 | 2 | 13 | 5994 |
| 1 | 2 | 14 | 16707 |
| 1 | 2 | 15 | 15771 |
| 1 | 2 | 16 | 169202 |
| 1 | 2 | 17 | 1820 |
| 1 | 2 | 18 | 2086 |
| 1 | 2 | 19 | 2115 |
| 1 | 2 | 20 | 2343 |
| 1 | 2 | 21 | 2193 |
| 1 | 2 | 22 | 1172 |
| 1 | 2 | 23 | 1312 |
| 1 | 2 | 24 | 1982 |
| 1 | 2 | 25 | 1468 |
| 1 | 2 | 26 | 1811 |
| 1 | 2 | 27 | 1898 |
| 1 | 2 | 28 | 2119 |
| 1 | 2 | 29 | 30309 |
| 1 | 2 | 30 | 31 |
| 1 | 2 | 31 | 1 |
| 1 | 2 | 42 | 1 |
| 1 | 2 | 43 | 1 |
| 1 | 2 | 44 | 276 |
| 1 | 3 | 3 | 4440 |
| 1 | 3 | 4 | 15 |
| 1 | 3 | 5 | 11 |
| 1 | 3 | 6 | 22 |
| 1 | 3 | 7 | 16 |
| 1 | 3 | 8 | 5 |
| 1 | 3 | 9 | 13 |
| 1 | 3 | 10 | 8 |
| 1 | 3 | 11 | 28 |
| 1 | 3 | 12 | 12 |
| 1 | 3 | 13 | 10 |
| 1 | 3 | 14 | 33 |
| 1 | 3 | 15 | 37 |
| 1 | 3 | 16 | 334 |
| 1 | 3 | 17 | 2 |
| 1 | 3 | 19 | 1 |
| 1 | 3 | 22 | 1 |
| 1 | 3 | 23 | 2 |
| 1 | 3 | 24 | 2 |
| 1 | 3 | 25 | 2 |
| 1 | 3 | 27 | 4 |
| 1 | 3 | 28 | 31 |
| 1 | 3 | 29 | 1268 |
| 1 | 4 | 2 | 161681 |
| 1 | 4 | 3 | 596 |
| 1 | 4 | 4 | 499 |
| 1 | 4 | 5 | 421 |
| 1 | 4 | 6 | 342 |
| 1 | 4 | 7 | 351 |
| 1 | 4 | 8 | 284 |
| 1 | 4 | 9 | 262 |
| 1 | 4 | 10 | 299 |
| 1 | 4 | 11 | 337 |
| 1 | 4 | 12 | 280 |
| 1 | 4 | 13 | 367 |
| 1 | 4 | 14 | 646 |
| 1 | 4 | 15 | 712 |
| 1 | 4 | 16 | 8431 |
| 1 | 4 | 17 | 94 |
| 1 | 4 | 18 | 88 |
| 1 | 4 | 19 | 94 |
| 1 | 4 | 20 | 76 |
| 1 | 4 | 21 | 83 |
| 1 | 4 | 22 | 96 |
| 1 | 4 | 23 | 93 |
| 1 | 4 | 24 | 86 |
| 1 | 4 | 25 | 83 |
| 1 | 4 | 26 | 104 |
| 1 | 4 | 27 | 112 |
| 1 | 4 | 28 | 661 |
| 1 | 4 | 29 | 14789 |
+------+---------+----------+-----------------+
I've began digging into the mogilefs code and made several observations that might
help us solve this problem.
- Most of the files that are over-replicated still exist in the file_to_replicate table.
- Running a !watch shows a lot of 'ran out of suggestions' error. Upon manually checking
the fid's involved they all appear to have > mindevcount copies.
- For a large period of time my disks were slowly accessible and had many timeouts
(which was solved by using lightie with multiple workers). This might have something to do
with the initial replication failing causing it to somehow end up looping.
- A plausible fix to me seems a simple check at the start of replicate() in Replicate.pm
to see on how many 'alive' devices the file exists and if this matches/exceeds the mindevcount
for that class. If it does then replicate() can return "2 (success, but someone else replicated it)"
so replicate_using_torepl_table() can safely call delete_fid_from_file_to_replicate().
Checking trackers...
192.168.0.100:6001 ... OK
Checking hosts...
[ 1] storage1 ... OK
[ 2] storage2 ... OK
[ 3] storage3 ... OK
[ 4] storage4 ... OK
[ 5] storage5 ... OK
Checking devices...
host device size(G) used(G) free(G) use% ob state I/O%
---- ------------ ---------- ---------- ---------- ------ ---------- -----
[ 1] dev1 136.611 81.343 55.268 59.54% writeable 90.8
[ 1] dev2 136.611 105.411 31.200 77.16% writeable 100.4
[ 1] dev3 136.611 105.901 30.710 77.52% writeable 100.4
[ 1] dev4 136.611 105.905 30.706 77.52% writeable 100.4
[ 1] dev5 136.611 105.970 30.641 77.57% writeable 89.6
[ 1] dev6 136.611 105.478 31.133 77.21% writeable 88.4
[ 1] dev7 136.611 81.648 54.964 59.77% writeable 100.4
[ 1] dev8 136.611 105.486 31.126 77.22% writeable 92.0
[ 1] dev9 136.611 118.235 18.376 86.55% writeable 100.4
[ 1] dev10 136.611 118.128 18.483 86.47% writeable 83.2
[ 1] dev11 136.611 118.091 18.520 86.44% writeable 100.4
[ 1] dev12 136.611 118.468 18.143 86.72% writeable 100.4
[ 1] dev13 136.611 117.953 18.658 86.34% writeable 100.4
[ 1] dev14 136.611 118.154 18.457 86.49% writeable 100.4
[ 1] dev15 136.611 122.528 14.083 89.69% writeable 100.4
[ 1] dev16 136.611 121.794 14.817 89.15% writeable 100.4
[ 1] dev17 136.611 122.161 14.450 89.42% writeable 100.4
[ 1] dev18 136.611 98.298 38.313 71.95% writeable 100.0
[ 1] dev19 136.611 117.070 19.541 85.70% writeable 100.4
[ 1] dev20 136.611 121.825 14.786 89.18% writeable 100.4
[ 1] dev21 136.611 122.386 14.225 89.59% writeable 100.4
[ 1] dev22 136.611 121.994 14.617 89.30% writeable 100.4
[ 1] dev23 136.611 121.900 14.711 89.23% writeable 100.4
[ 1] dev24 136.611 121.897 14.714 89.23% writeable 100.4
[ 1] dev25 136.611 121.664 14.947 89.06% writeable 100.4
[ 1] dev26 136.611 122.278 14.333 89.51% writeable 100.4
[ 1] dev27 136.611 98.187 38.424 71.87% writeable 100.4
[ 1] dev28 136.611 121.952 14.659 89.27% writeable 100.4
[ 2] dev29 698.101 68.175 629.926 9.77% writeable 26.7
[ 2] dev30 698.101 68.105 629.996 9.76% writeable 4.0
[ 2] dev31 698.101 67.369 630.731 9.65% writeable 15.8
[ 2] dev32 698.101 68.895 629.206 9.87% writeable 11.9
[ 2] dev33 698.101 69.219 628.882 9.92% writeable 5.0
[ 2] dev34 698.101 69.215 628.886 9.91% writeable 11.9
[ 2] dev35 698.101 67.500 630.601 9.67% writeable 6.9
[ 2] dev36 698.101 68.141 629.959 9.76% writeable 0.0
[ 2] dev37 698.101 69.401 628.700 9.94% writeable 10.9
[ 2] dev38 698.101 69.019 629.082 9.89% writeable 16.8
[ 2] dev39 698.101 68.304 629.797 9.78% writeable 19.8
[ 2] dev40 698.101 67.813 630.287 9.71% writeable 2.0
[ 2] dev41 698.101 68.758 629.343 9.85% writeable 6.9
[ 2] dev42 698.101 67.699 630.401 9.70% writeable 1.0
[ 2] dev43 698.101 69.285 628.816 9.92% writeable 18.8
[ 3] dev44 229.176 193.427 35.749 84.40% writeable 3.2
[ 3] dev45 222.451 192.966 29.485 86.75% writeable 5.2
[ 4] dev46 229.176 200.376 28.800 87.43% writeable 0.0
[ 4] dev47 222.451 196.413 26.037 88.30% writeable 0.0
[ 5] dev48 182.044 162.701 19.342 89.37% writeable 21.6
[ 5] dev49 232.823 194.586 38.236 83.58% writeable 18.8
---- ------------ ---------- ---------- ---------- ------
total: 15614.739 5329.472 10285.267 34.13%
Regards,
Arjan
More information about the mogilefs
mailing list