Filesystem and directory structure question

robb robb at canfield.com
Tue Aug 28 00:39:26 UTC 2007


The Filesystem code (which I used as a base for SFTP) creates a deeply
nested directory structure based on the SHA1 digest. It creates 3 levels
where each is the next 4 characters from the SHA1 digest.

37123c4781b5ea30244882c157f6f1b4293d7348 -->
  3712/3c47/781b/

This means that any given directory can have 65,536 maximum files, which
seems high. It also causes a number of performance issues, especially
with SFTP. Directory creation can take a LOT of time and resources.

I ran some tests using a the first 2 characters as the directory name,
and not nesting any deeper. This provides a maximum of 256 root
directories. SHA1 seems fairly random in nature (and it should be). I am
seeing a standard deviation inline with that expected for SHA1 (around
6.5 is the expected value). I am not a statistician nor do I play
one on the Internet, but these results seems to be reasonable and
indicate a simple 2 character flat directory should be faster (less to
look for in each directory).

I have no plans on changing Filesystem although it would be TRIVIA to
create a Filesystem_Flat that does so. But I have changed my SFTP
module. I will also play with 3 character names but the problem there is
that at 4096 possible it becomes a bit heavy. Maybe a different
calculation is in order.

This may be mute for many since Amazon's S3 service does not fall under
this constraint. But I am required to have alternate backup strategies
available (vendors), thus the work on SFTP.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3237 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.danga.com/pipermail/brackup/attachments/20070827/ac79b255/smime.bin


More information about the brackup mailing list