quick question on usage

Fri Oct 29 22:57:34 PDT 2004

On Fri, 29 Oct 2004, Adam Michaels wrote:

> I want to make sure I understand how mogilefs works. You have an API
> which sends files to the tracker which gives the files a hash in the
> tracker database and then sends it to the mogfile storage daemon which
> holds the file. Does the file itself actually go to to the tracker or
> does it go directly to the storage server?

The tracker just tells the client (the API) where to write to.  Then the
client once it's done writing to the storage node tells a tracker, "I'm
done, and I wrote this many bytes.".  Then the tracker(s) begin
replicating that file around.

> When you request a file that request goes to the tracker and then to the
> server server.

Kinda.  The client asks the tracker, then the client gets the file from
one of the returned locations.  The client can either ask for a list of
all locations, and do its own checking to see what storage nodes are up,
or it can ask the tracker to only give it a known verified alive URL.
That's the "noverify" option in the "get_paths" protocol request.

> Then the file is retrieved to the API/webserver. Is that
> correct? Also, is there anyway to have the storage servers send the file
> instead of sending to the webserver?

That's where Perlbal (our load balancer/proxy/webserver) comes in.

If your webserver (mod_perl/php) returns a magic header that's like:

X-Reproxy-Url: http://..., http://,,,

Then Perlbal internally finds one of those resources and streams it to the
client, via the storage node, and your mod_perl/php isn't blocked while it
spoonfeeds a slow modem/dsl/cable user.  (which are all slower than your
internal network)

So your app can just do "get_paths" with "noverify=1" set, and then stuff
into an HTTP response header the whole list of URLs the tracker gives the
app.

- Brad