Serving files securely to web clients via mogile
dormando at rydia.net
Thu Nov 8 18:11:55 UTC 2007
> The problem I'm trying to avoid by serving the files directly from the
> storage nodes is overloading some dedicated machines with the work of
> "proxying" data between the storage node and the end user. Also I don't
> want the extra traffic on my local network. Path caching can still be
> done by whoever provides the clients with the url's to the files. I can
> easily avoid accessing the trackers too much by caching these paths, but
> at the point when a client wants to d/l the file accessing the storage
> node directly seems like the most optimized solution to me.
You're prematurely optimizing here. Nodes which do "proxying" are able
to handle many hundreds (or thousands) of requests per second, which
will be enough to overwhelm the harddrives of all of your storage nodes.
When I've tried optimizing this path in the past, the best thing I've
done is put the dumbest, fastest possible load balancing in front of
_perlbal_, enabling path caching in perlbal, and having modules or
lightweight backend nodes handle one more layer of path caching with
It will be a big loss if your backend is no longer tolerant to mogilefs
reorganizing files on its own. As soon as you give a path to a client
you *must* support it until they no longer have it cached. It would be a
disservice to clients if their HEAD requests all started 302'ing, or
worse not working at all. The amount of effort it takes perlbal to
handle spoonfeeding will be less than the overhead by any reliability
and security scheme you'll need to support otherwise. I'll say this with
the ideal of being happily surprised if proved wrong ;)
> Alternatively I can do everything from the already existing web server
> on each storage node by hacking my way through MogileFS sources to add
> some file name security options when it configures the storage node's
> web server. If this seems logical and a better solution than two web
> servers running on each storage node, then I might actually do it
> (assuming some support from "the experts" will be available).
If you can prove it has real tangible results, it's an open project and
you can do whatever you want with it :)
I'd really rather never do that; why would I give clients the ability to
DoS my storage nodes directly?
> Finally I'd like to ask if there's any preference as to what web server
> to run on the storage nodes (lighttpd/apache/perlbal), and what was the
> original intention behind this flexibility?
perlbal's okay for most loads now. Preferrably you're letting perlbal
handle the writes, then the "GET's" are probably best served by
something multithreaded like apache. I've heard from numerous people
that pseudo threaded AIO doesn't work as well as expected under high IO
load. Nothing empirical myself though.
The motivation is freedom of choice. If something comes along that can
store/serve the files faster, or fit better into someone's setup, they
should be able to use it.
More information about the mogilefs