Patch to MogileFS::Client to fetch only a range of bytes from a file

Arthur Bebak abebak at fabrikinc.com
Fri May 11 22:02:17 UTC 2007


I think I sent this in last year but with all the various code changes recently
I don't think this ever made the migration from the old client code to the new.

In any event, here is the diff -u patch which adds a new method "get_file_data_range"
to MogileFS::Client allowing you to fetch only a range of bytes of a file rather
then the whole thing. Very handy in all sorts of situations.

Documentation included in the patch. Enjoy.

Arthur Bebak
abebak at fabrikinc.com


--- Client.orig 2007-05-11 13:32:48.000000000 -0700
+++ Client.pm   2007-05-11 14:59:02.000000000 -0700
@@ -468,6 +468,154 @@
      return undef;
  }

+=head2 get_file_range
+
+Given a key and a range returns a scalar reference to a string which contains
+the requested byte range of the file.
+
+For example, to get 1000 bytes starting at byte 100 you can do something
+like this:
+
+   %arg_hash = ( "length" => "1000", "offset" => "100");
+   $content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
+
+The same example, using a range:
+
+   %arg_hash = ( "range" => "100-999");
+   $content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
+
+See the definition of the HTTPD "Range:" header for details of what
+the "range" key can look like, but in general assuming a file of size 10000
+you can do range values like this:
+
+=over 2
+
+=item
+
+The first 500 bytes (byte offsets 0-499, inclusive):     "range" => "0-499"
+
+=item
+
+The second 500 bytes (byte offsets 500-999, inclusive):  "range" => "500-999"
+
+=item
+
+The final 500 bytes (byte offsets 9500-9999, inclusive): "range" => "-500"
+
+=item
+
+The final 500 bytes (byte offsets 9500-9999, inclusive): "range" => "9500-"
+
+=item
+
+The first and last bytes only (bytes 0 and 9999):        "range" => "0-0,-1"
+
+=back
+
+Be careful because not all web servers support all of the range header
+formats above, so test against whatever is running on your mogstored nodes.
+You should usually be safe by specifying the starting and ending bytes in
+your range.
+
+The other way to get a range is to give an offset into the file, and
+specify the length.
+
+So for example, given "length" => 1000, "offset" = 100, you'd get the
+equivelent of "range" => "100-1099". Note that the offset byte is included,
+so in general the formula is:
+
+   $range = $offset . "-" . $length - 1;
+
+If offset is not given, then it is assumed that "offset" => 0. This makes
+it easy to get the first $n bytes of the file:
+
+   $n = 100;
+   %arg_hash = ( "length" => $n );
+   $content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
+
+If the range key is defined, length/offset are ignored.
+
+The file is fetched from the storage nodes using LWP::UserAgent, so
+%arg_hash can also have a "timeout" key which is just passed on to
+that module. The requests is aborted if no activity on the connection to
+the server is observed for "timeout" seconds:
+
+   %arg_hash = ( "length" => 100, "timeout" => 30 );
+   $content_ref = $mogfs->get_file_data_range( $key, %arg_hash );
+
+=cut
+
+sub get_file_data_range {
+    # given a key, load some paths and get data
+    my MogileFS::Client $self = shift;
+    my ($key, %arg_hash) = @_;
+
+    # Let's parse all the optional args
+    my $timeout;
+    if( exists $arg_hash{'timeout'} ) {
+        $timeout = $arg_hash{'timeout'};
+        } # if
+
+    my $range;
+    my $offset;
+    my $length;
+    if( exists $arg_hash{'range'} ) {
+        $range = $arg_hash{'range'};
+        } # if
+    else {
+
+       if( exists $arg_hash{'offset'} ) {
+          $offset = $arg_hash{'offset'}
+          } # if
+       else { $offset = "0"; }
+
+       if( exists $arg_hash{'length'} ) {
+          my $num_bytes = $arg_hash{'length'} + $offset - 1;
+          $range = $offset . "-" . $num_bytes;
+          } # if
+
+       } # else
+
+    my @paths = $self->get_paths($key, 1);
+    return undef unless @paths;
+
+    # iterate over each
+    foreach my $path (@paths) {
+        next unless defined $path;
+        if ($path =~ m!^http://!) {
+            # try via HTTP
+            my $ua = new LWP::UserAgent;
+            $ua->timeout($timeout || 10);
+
+            my $res;
+            if(defined $range) {
+               #
+               # This will creata a request HTTPD header which looks like this:
+               # Range: bytes=$range
+               #
+               $res = $ua->get($path, "Range" => "bytes=$range" );
+                } # if
+            else {
+               $res = $ua->get($path);
+                } # else
+
+            if ($res->is_success) {
+                my $contents = $res->content;
+                return \$contents;
+            }
+
+        } else {
+            # open the file from disk and just grab it all
+            open FILE, "<$path" or next;
+            my $contents;
+            { local $/ = undef; $contents = <FILE>; }
+            close FILE;
+            return \$contents if $contents;
+        }
+    }
+    return undef;
+}
+
  =head2 delete

      $mogc->delete($key);


More information about the mogilefs mailing list