Input on best practices for serving files

Thu Apr 10 01:16:01 UTC 2008

Excellent writeup Mark, thank you!   (Sorry for the double-post, I'm
used to mailing lists setting reply-to :))

On Wed, Apr 9, 2008 at 4:25 PM, Mark Smith <smitty at gmail.com> wrote:
> [snip]
> I have no idea about the MogileFS wiki... we recently put something for
> Perlbal on Google Code, might be useful to do the same thing for MogileFS...
> Brad?
>

This would be awesome.  There is quite a few different disjointed
sources for mogileFS info... centralized location would be grand.
>
> >  To begin with, it is recommended to write an "interpretation" layer
>  >  that translates the end-user URL to an internal mogileFS layer.  For
> >  this example, a url like
> >  "http://static.myapp.com/users/1234567-thumb.jpg" would then be
>  >  translated to a mogileFS stored key, along with the class.  To keep
> >  this simple, the key could be "1234567:thumb" and the class would be
> >  "users".
>
> Yes, typically this is in your backend webservers.  See below.
>
>
> >  The next step is to write some code that takes the key and class,
> >  fetches the paths from a tracker and then, in some fashion, proxies
> >  the image response.
>
> Perlbal is the recommended way of doing this, actually, as it does most of
> the work for you.
>
>

There was a post that mentioned that perlbal was inferior to
Apache/lighttpd for the front-end bit (I can dig it up, but I'm happy
using anything that works, really).  But, again, perlbal is an area of
ignorance.  I understand what you are saying, but not sure how to put
it into practice... this is where a Recipes wiki node would be very
helpful. :)

> Actually this is something done at the backend.  Perlbal doesn't know what
> kind of file it is, neither does MogileFS.  You're expected to have some
> sort of logic in your application that sends back the proper headers for the
> file you want to reproxy.
>

This is actually causing me some consternation in the actual deployment.

If I get a URL, like http://static.myapp.com/user/123456.jpg should I
trust jpg is the extension?  What if the actual asset is an FLV?
Should the step that maps the 123456 to the mogileFS node always query
the application for attributes on that image?  I know there was
discussion about a meta store attached to nodes in mogileFS that I saw
somewhere... it just seems wasteful to issue a query (although it can
be cached so probably not _that_ wasteful and premature optimization)
for just the mime-type... although I can see other benefits, but for
somethign like a simple image server it seems overkill.
>
> >  The other point that I'd like to talk about is an image manipulation
> >  layer.  Something like how most image hosting services do that checks
> >  the incoming referrer header.  If the referrer is blank, or the host
>  >  is not equal to "myapp.com", then affix a "stamp" on to the top of the
> >  image.  Has there been any thoughts for a mogileFS Cookbook setup?  I
> >  think it would greatly help out newcomers to the product.
>
> Easily done in the traditional setup.
>
> So, enough about that - this is traditionally how MogileFS is used.  There
> are other ways of using it, but it's sort of designed around the idea of
> using Perlbal, so many of the pieces fit more neatly if you do.
>
>                                   /->  [ webserver ] -> [ mogilefsd ]
>                                  /                           |
> { internet } -> [ perlbal ] -> <                            /
>                                  \                          /
>                                  \->  [ mogstored ] <-----/
>
> I don't know if that will translate well.  There are graphics somewhere...
>

Translated well enough.  I haven't seen many graphics but I am happy
to draw some up if they're lost.

> Anyway.  So Perlbal answers all incoming requests.  It talks to your
> webservers and your MogileFS storage nodes (mogstored/lighttpd/whatever you
> use as the storage server component).  Your webserver talks to the MogileFS
> trackers (mogilefsd instances).  The trackers talk to the storage nodes.
>
> Let's walk through the request, something like:
> [snip]

Yeah, I think I should whip up some pretty graphics with vast amounts
of annotated notes along the way.  And perhaps start MogileFS::Manual?
:)

I wrote up a simple Catalyst application that handles the translation,
fetching and setting the reproxy URLs.  Probably overkill to use
Catalyst, but I'm attached to that project and like it's ease of
deployment.  It can be mostly self contained in a single file, so I'll
work that out as a recipe.

>
> Anyway, this is somewhat rambling.  I and others are happy to answer
> questions, just fire away.  If you really want to work on documentation, I'd
> be happy to buy you a beer.  It's something sorely sorely lacking.
>
>

I'm happy to work on documentation just because I'm trundling through
it now.  I'm not typically a doc gnome, but in this case it seems well
enough.  I was going to just write a series of articles on one of my
blogs, but I'd rather have it be more "official".  For now, I can
placeholder it on my blogs and submit POD patches against trunk.