Defining our own Mgd::find_deviceid...

Eric Lambrecht eml at guba.com
Mon Nov 6 18:42:52 UTC 2006


Brad Fitzpatrick wrote:
> I'm slowly trying to kill everything in Mgd:: and make it all much more
> object oriented.
> 
> Could we just add an optional method to the same class that handles
> policy decisions for replication locations, and call it ->initial_devids.
> (plural.... as clients can ask for multiple candidate locations, to try in
> order in case of failure....)
> 
> But yeah, done cleanly and ideally not in Mgd::, I'd take a patch.

We've been able to extract that find_deviceid code and throw that 
functionality into the replication policy. It requires an additional 
change to the create_open command handling to use the replication policy 
to figure out where to store things, rather than going straight to 
Mgd::find_deviceid.

I'll try to get some diffs up before the end of the week. We ended up 
pushing a lot of things around, and I wanted to generate a bunch of 
smaller patches for submission back to the project...

Eric...


> 
> 
> 
> 
> On Fri, 3 Nov 2006, Saunders, Newton wrote:
> 
> 
>>Hi,
>>
>>
>>
>>My company is investigating MogileFS and we are trying to come up with a
>>good method to rebalance files when new storage nodes are added.
>>Defining our own replication policy solves part of the problem. However,
>>ideally, we'd like to specify where the first copy of a new file is
>>stored in the same way that we are able to specify where subsequent
>>copies are stored (in the replication policy).  We'd like to be able to
>>define our own Mgd::find_deviceid.
>>
>>
>>
>>To show why we'd need to define our own Mgd::find_deviceid, this is how
>>we'd implement rebalancing:
>>
>>
>>
>>- Before existing storage nodes are too full (say at ~60% capacity), add
>>some new empty storage nodes,
>>
>>- Mgd::find_deviceid would specify to save the initial copy of new files
>>to existing storage nodes only.  (Reason: we assume that new files are
>>more frequently accessed than older ones and do not want the majority of
>>new files on the new storage nodes).
>>
>>- We would run a separate process that simply "rewrites" old files.
>>When "resaving" the initial copy of these old files, Mgd::find_deviceid
>>would specify to save them to the new storage nodes.
>>
>>- For replication, our policy would use the same Mgd::find_deviceid we
>>defined above to replicate new files to existing storage nodes and old
>>files to new storage nodes.
>>
>>- Once all storage nodes are balanced, we will stop "rewriting" old
>>files and new files will be saved as they are now...to the storage node
>>with the most available space.
>>
>>
>>
>>- A few notes:
>>
>>  - We are able to differentiate between new files and rewritten files
>>because we store meta data about each file outside of MogileFS (such as
>>the original creation date).  If the creation date is before the date
>>the storage node was added, we assume it is being rewritten, otherwise,
>>we assume it is new.
>>
>>  - A storage node is considered "new" if its usage is largely less than
>>the average usage across all storage nodes.
>>
>>
>>
>>
>>
>>If we added a patch that allowed users to define their own version of
>>Mgd::find_deviceid, would that be included?
>>
>>
>>
>>
>>
>>Thanks,
>>
>>
>>
>>Newton Saunders
>>
>>
>>
>>



More information about the mogilefs mailing list