Using md5 sum of file contents as key

Mark Smith smitty at gmail.com
Tue Apr 15 19:49:56 UTC 2008


>  We've got a number of applications that require scalable storage, that have
>  different front end and business requirements but often end up containing the
>  same files (largely images).
>
>  We're considering using mogilefs as a storage solution, using a md5 sum (or
>  SHA-xx) of the file's contents as the key.  This key would be stored by each
>  application in their own databases along with all the metainformation which
>  is application dependent.
>
>  This would provide a guarantee* that we were never 'wasting' storage by
>  storing the same file multiple times, without making major changes to our
>  applications.
>
>  Has any body used a function of the file contents as the key before?  Good
>  idea/bad idea?

The FotoBilder software (powers PicPix.com and LiveJournal's ScrapBook
service) uses this concept.  Although it doesn't use the hash as the
MogileFS key, it uses the hash to reference a record in the database,
where it can then look up the picture's ID.  That helps it keep track
of how many people are using this picture so it can be cleaned up
later, too.

It's a fine idea, it will work great either way you do it.  :)


-- 
Mark Smith / xb95
smitty at gmail.com


More information about the mogilefs mailing list