Defining our own Mgd::find_deviceid...

Brad Fitzpatrick brad at danga.com
Sun Nov 5 22:46:24 UTC 2006


I'm slowly trying to kill everything in Mgd:: and make it all much more
object oriented.

Could we just add an optional method to the same class that handles
policy decisions for replication locations, and call it ->initial_devids.
(plural.... as clients can ask for multiple candidate locations, to try in
order in case of failure....)

But yeah, done cleanly and ideally not in Mgd::, I'd take a patch.




On Fri, 3 Nov 2006, Saunders, Newton wrote:

> Hi,
>
>
>
> My company is investigating MogileFS and we are trying to come up with a
> good method to rebalance files when new storage nodes are added.
> Defining our own replication policy solves part of the problem. However,
> ideally, we'd like to specify where the first copy of a new file is
> stored in the same way that we are able to specify where subsequent
> copies are stored (in the replication policy).  We'd like to be able to
> define our own Mgd::find_deviceid.
>
>
>
> To show why we'd need to define our own Mgd::find_deviceid, this is how
> we'd implement rebalancing:
>
>
>
> - Before existing storage nodes are too full (say at ~60% capacity), add
> some new empty storage nodes,
>
> - Mgd::find_deviceid would specify to save the initial copy of new files
> to existing storage nodes only.  (Reason: we assume that new files are
> more frequently accessed than older ones and do not want the majority of
> new files on the new storage nodes).
>
> - We would run a separate process that simply "rewrites" old files.
> When "resaving" the initial copy of these old files, Mgd::find_deviceid
> would specify to save them to the new storage nodes.
>
> - For replication, our policy would use the same Mgd::find_deviceid we
> defined above to replicate new files to existing storage nodes and old
> files to new storage nodes.
>
> - Once all storage nodes are balanced, we will stop "rewriting" old
> files and new files will be saved as they are now...to the storage node
> with the most available space.
>
>
>
> - A few notes:
>
>   - We are able to differentiate between new files and rewritten files
> because we store meta data about each file outside of MogileFS (such as
> the original creation date).  If the creation date is before the date
> the storage node was added, we assume it is being rewritten, otherwise,
> we assume it is new.
>
>   - A storage node is considered "new" if its usage is largely less than
> the average usage across all storage nodes.
>
>
>
>
>
> If we added a patch that allowed users to define their own version of
> Mgd::find_deviceid, would that be included?
>
>
>
>
>
> Thanks,
>
>
>
> Newton Saunders
>
>
>
>


More information about the mogilefs mailing list