Data splitting

Mark Smith junior at
Tue Aug 7 15:05:55 UTC 2007

> Ok it's making sense to me now! Thanks!
> So basically, a request goes something like this:
> client ( -> Perlbal -> Apache -> Mogilefsd ->
> Apache (returns with something like serverA:1234.fid) -> Perlbal -> client
> Is that correct?  If so, wouldn't that lead to more load on the Apache,
> since it would have to dynamically translate to
> something Mogilefsd can understand and find in its database?
> I'm guessing though, that there's some way for Perlbal to talk directly to
> Mogilefsd, so that it can completely bypass Apache?

Initial HTTP request: Client -> Perlbal -> Apache.
Apache says: "Oh, this is something in MogileFS!"
Apache connects to MogileFSd and asks about it.
MogileFSd returns the path where the item can be found internally.
Apache returns: "Reproxy this!  Available at $url."
Perlbal then connects to mogstored and gets the item, reproxying it.

So yes, that's how it goes.  It feels like a lot of steps, but it's actually
quite fast if caching is working.  The longest part will be actually
reproxying the file to the user.

Also, there are several layers of caching you can use.  MogileFSd has path
caching you can turn on to reduce load on the database it uses.  Perlbal has
reproxy caching you can turn which shortens the trip rather extremely, then
requests go user -> Perlbal -> mogstored and back.

There are diagrams of this process available here:

Slide 62 has the internal redirect process and its steps.  (Although it
doesn't show the mod_perl connecting to MogileFSd, but that's easy to
imagine in there.)

Mark Smith / xb95
smitty at
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the mogilefs mailing list