Patch adds ranged data fetches to MogileFS::Client
Arthur Bebak
abebak at fabrikinc.com
Thu Oct 26 23:07:07 UTC 2006
All,
I've created a new method for MogileFS::Client which allows you
to fetch only portions of the file.
This is mighty useful if you're dealing with large files and don't
want to slurp the entire thing into memory all at once - something
I'm sure many Mogile users are concerned about.
The method is called get_file_data_range.
# Here's how to call it, pick one method.
# The range arg always overrides length and offset.
# If not provided, offset is assumed to be 0.
# All numbers are in units of bytes
#
%arg_hash = ( "range" => "1000-1100" ); # bytes 1000-1100 inclusive
%arg_hash = ( "length" => "100", "offset" => "500" ); # bytes 500-599
%arg_hash = ( "length" => "300" ); # bytes 0-299
$content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
print "content = '$$content_ref'";
I'm basically constructing a Range: HTTP header to send to
the Mogile storage daemon. You can read about what you can
put in the "range" key here:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
The patch below is to MogileFS-Client-1.03/lib/MogileFS/Client.pm
You can apply it like so:
Copy everything between the start/end lines to a patchfile,
then run these commands:
cd MogileFS-Client-1.03/lib/MogileFS
patch Client.pm < patchfile
perl Makefile.PL
make
make install
------------------ start patch ------------------
264a265,360
> #
> # given a key, returns a scalar reference pointing at a string containing
> # the contents of the file. takes two parameters; a scalar key to get the
> # data for the file, and a hash which can have one of several keys:
> # %arg_hash = ( "timeout" => 10, "range" => "0-1024", "length" => "1000", "offset" => "100");
> #
> # See the definition of the HTTPD Range: header for details of what
> # the "range" key can look like, but in general assuming a file of size 10000
> # you can do values like this:
> #
> # The first 500 bytes (byte offsets 0-499, inclusive): "range" => "0-499"
> # The second 500 bytes (byte offsets 500-999, inclusive): "range" => "500-999"
> # The final 500 bytes (byte offsets 9500-9999, inclusive): "range" => "-500"
> # "range" => "9500-"
> # The first and last bytes only (bytes 0 and 9999): "range" => "0-0,-1"
> #
> # The other way to get a range is to give an offset into the file, and
> # specify the length. So for example, given "length" => 1000, "offset" = 100,
> # you'd get the equivelent of "range" => "100-1099". Note that the
> # offset byte is included, so in general the formula is:
> # $range = $offset . "-" . $length - 1;
> #
> # If offset is not given, then it is assumed that "offset" => 0. This makes
> # it easy to get the first $n bytes of the file:
> # $n = 100;
> # %arg_hash = ( "length" => $n );
> # $content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
> #
> # If the range key is defined, length/offset are ignored.
> #
> sub get_file_data_range {
> # given a key, load some paths and get data
> my MogileFS::Client $self = shift;
> my ($key, %arg_hash) = @_;
>
> # Let's parse all the optional args
> my $timeout;
> if( exists $arg_hash{'timeout'} ) {
> $timeout = $arg_hash{'timeout'};
> } # if
>
> my $range;
> if( exists $arg_hash{'range'} ) {
> $range = $arg_hash{'range'};
> } # if
>
> my $offset;
> if( exists $arg_hash{'offset'} ) { $offset = $arg_hash{'offset'} } # if
> else { $offset = "0"; }
>
> my $length;
> if( exists $arg_hash{'length'} && ! exists $arg_hash{'range'} ) {
> my $num_bytes = $arg_hash{'length'} + $offset - 1;
> $range = $offset . "-" . $num_bytes;
> } # if
>
> my @paths = $self->get_paths($key, 1);
> return undef unless @paths;
>
> # iterate over each
> foreach my $path (@paths) {
> next unless defined $path;
> if ($path =~ m!^http://!) {
> # try via HTTP
> my $ua = new LWP::UserAgent;
> $ua->timeout($timeout || 10);
>
> my $res;
> if(defined $range) {
> #
> # This will creata a request HTTPD header which looks like this:
> # Range: bytes=$range
> #
> $res = $ua->get($path, "Range" => "bytes=$range" );
> } # if
> else {
> $res = $ua->get($path);
> } # else
>
> if ($res->is_success) {
> my $contents = $res->content;
> return \$contents;
> }
>
> } else {
> # open the file from disk and just grab it all
> open FILE, "<$path" or next;
> my $contents;
> { local $/ = undef; $contents = <FILE>; }
> close FILE;
> return \$contents if $contents;
> }
> }
> return undef;
> }
>
------------------ end patch --------------------
--
Arthur Bebak
abebak at fabrikinc.com
More information about the mogilefs
mailing list