MogileFS limits...

dormando dormando at rydia.net
Sat Apr 12 23:27:34 UTC 2008


Nathan Schmidt wrote:
> Dormando,
> 
> The most intense performance hits for us have come from losing storage
> nodes and the resulting rebalance operations putting a lot of
> read/write load on the tables. Worst case we've seen was almost 20
> hours of thrashing. Not really user-affecting but it did peg a lot of
> ops charts.

We have huge storage hosts (bad idea, I know). So losing an entire host, 
while infrequent, can cause thrashing of up to a week. Lowering the 
replicate workers prevents it from actually hurting anything, but it 
does run full tilt for a while. We have some ideas floating around to 
make this less painful / adjustable. Probably won't do them for a while 
though :\

> We're far from optimal - using UTF-8 collations/charsets which seems
> to unconditionally add a great deal of overhead to index sizes. Our
> initial bring-up was on a clean debian box so we went with MyISAM,
> which does have its problems - index corruption, O(n) repairs, ugh.
> Functional but non-optimal. Good point about the optimize table
> routine, we probably haven't done that for a while.

Oh man :)

Switch to InnoDB as soon as possible! The access patterns were designed 
for InnoDB, and you're very likely to get lock thrashing under MyISAM. 
MogileFS also counts on the database being fairly reliable.

Get a box with a decent RAID housing battery backed write cache, get 
more RAM, and switch to InnoDB :) Should be way faster even if the data 
size jumps a bit during the switch. OPTIMIZE TABLE still applies.

> We do a lot of list_keys operations when our caches are cold because
> we're using mogile as our primary datastore for wiki page revisions -
> we didn't really anticipate the rest of our infrastructure handling
> this kind of data size so well so mogile's been left alone while the
> rest of our system has become much more efficient and
> well-distributed. We're on all-commodity hardware, and those boxes max
> out at 8Gb each, which puts a ceiling on where we can go without
> moving to some substantially more expensive systems. Our file.MYI and
> file_on.MYI are 8.5Gb together, which means MySQL is doing some
> interesting tricks to keep things running smoothly. That said, we're
> getting ready to bite the bullet and bring in somewhat more
> substantial boxes for our tracker dbs, oh well.

I think 16-32G are still commodity (see Ask's response)...

I don't really believe you lose out once the index is large than RAM. 
Especially with MyISAM files, you lose out when your key_reads over 
key_read_requests get above nothing. A bulk of your data is unlikely to 
be active, so the entire index is never cached in RAM. Under innodb you 
give both the index _and_ data caches a large sum of memory, which are 
also LRU's.

So while my mogilefs DB's have 50G+ datafiles sometimes, 8-16G of RAM is 
plenty. The writeback cache is a gigantic win due to all of the scratch 
writes mogilefs does. Also switching to deadline elevator for the io 
scheduler... The read is read cache, and you're caching hot paths :)

-Dormando


More information about the mogilefs mailing list