Serving files securely to web clients via mogile

Andy Lo A Foe andy.loafoe at gmail.com
Thu Nov 8 15:16:48 UTC 2007


Hi,

This is what we currently do on our cluster: each storage node has a HTTP
server/rails app that knows how to decode the file request down to getting a
FID. If the FID it is not local to itself it does a HTTP 302 redirect to the
(a) machine which does have the FID locally available. This avoids network
traffic but requires at most one 302 redirect of the original request. Once
we have more storage nodes (i.e. 10+) we will switch to Perlball's
X-Reproxy-URL since maintaining the logic on a lot of machines will be
cumbersome and prone to failure IMHO.. even with nice tools like Capistrano.

Gr,
Andy

On Nov 8, 2007 4:07 PM, Yoav Steinberg <yoav at monfort.co.il> wrote:

> Mark Smith wrote:
> >> I want to use mogile to serve files to remote clients via http. Ideally
> >> the files should be accessed directly from the storage nodes by the
> >> clients. Mogile seems good for this since all file access is via http,
> >> and it works nicely with robust http servers (like lighty).
> >>
> >> Problem is that I don't want the remote clients to see the actual
> >> MogileFS file path when accessing the files and I want some security
> not
> >> letting any client access any file. So instead of providing the client
> >> with something like "http://myserver.com/dev1/0/000/000/000000001.fid",
> >> I want the client to access some name I generate (for example with
> >> lighty's mod_secdownload).
> >>
> >> Question is how would I go about configuring the storage nodes to do
> >> this? Is there any such built in functionality in MogileFS?
> >>
> >
> > That's not how MogileFS is traditionally used in a web environment.
> > The idea is that you run your MogileFS network internally and then
> > expose to the user something else that uses the MogileFS trackers to
> > translate your-names to internal-names.  The storage nodes are then
> > never exposed to the end users.
> >
> > This setup gives you the advantage of controlling paths caching in
> > your application (you know best when certain items should expire, if
> > ever) as well as the flexibility to properly handle fallback in the
> > case of unavailable storage nodes.  It's also safer, you don't have to
> > plan your storage nodes to have separate upload and download
> > processes.
> >
> > Anyway ... apparently we don't have "best practices" setup information
> > on the site or wiki... that's lame.  Well, typically MogileFS is
> > combined with Perlbal as the latter does most of the heavy lifting for
> > using MogileFS.  You still need your application servers to do a paths
> > lookup and decide on caching policies (if you want to enable path
> > caching), but you don't need to do any file serving there, Perlbal
> > will do it from the storage nodes.
> >
> > I have a strong feeling that this is just going to be more confusing
> > than it was helpful, and I know our documentation is horrendous for
> > beginners, so please reply (to the list!) with questions and we'll get
> > you going.  :)
> >
> >
> The problem I'm trying to avoid by serving the files directly from the
> storage nodes is overloading some dedicated machines with the work of
> "proxying" data between the storage node and the end user. Also I don't
> want the extra traffic on my local network. Path caching can still be
> done by whoever provides the clients with the url's to the files. I can
> easily avoid accessing the trackers too much by caching these paths, but
> at the point when a client wants to d/l the file accessing the storage
> node directly seems like the most optimized solution to me.
>
> One option is installing a second http server on each storage node, it
> will do whatever translation or security I want and then access the
> local files upon request from the remote client. The remote client will
> know what storage node to get to by talking with some web-app that uses
> the trackers to find what storage nodes have the requested file, this
> app can also do path caching if required.
>
> Alternatively I can do everything from the already existing web server
> on each storage node by hacking my way through MogileFS sources to add
> some file name security options when it configures the storage node's
> web server. If this seems logical and a better solution than two web
> servers running on each storage node, then I might actually do it
> (assuming some support from "the experts" will be available).
>
> Finally I'd like to ask if there's any preference as to what web server
> to run on the storage nodes (lighttpd/apache/perlbal), and what was the
> original intention behind this flexibility?
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/mogilefs/attachments/20071108/a201ea5c/attachment.htm


More information about the mogilefs mailing list