gc and inventory db

Kostas Chatzikokolakis kostas at chatzi.org
Fri Jan 22 01:07:47 UTC 2010


> The only tricky part is that the @orphaned_chunks are the stored chunk
> hashes, which are used as (part of) the inventory_db *values*, not the
> keys. (And we can't derive the keys because those mappings were in
> metafiles we've already pruned.)
>
> So I think we have to add something like a delete_values_beginning_with()
> method to the Dictionary classes under the InventoryDatabase, which seems
> a bit ugly.

A simple solution is to put all keys to delete in a hash (even with 
thousands of chunks it shouldn't take too much memory). Then you iterate 
over the whole inventory db once, you retrieve the digest from each 
value and you delete if it's in the hash.

Not optimal but simple and it should be quite fast compared to the time 
it takes to run gc anyway.

Kostas


More information about the brackup mailing list