Using md5 sum of file contents as key

Ian Sherratt shez at starfangled.net
Tue Apr 15 11:30:17 UTC 2008


Heya,

We've got a number of applications that require scalable storage, that have 
different front end and business requirements but often end up containing the 
same files (largely images).

We're considering using mogilefs as a storage solution, using a md5 sum (or 
SHA-xx) of the file's contents as the key.  This key would be stored by each 
application in their own databases along with all the metainformation which 
is application dependent.

This would provide a guarantee* that we were never 'wasting' storage by 
storing the same file multiple times, without making major changes to our 
applications.

Has any body used a function of the file contents as the key before?  Good 
idea/bad idea?

Cheers!
Shez

* OK hash collisions are always possible, so filelength:SHA-256 would be a 
better key.


More information about the mogilefs mailing list