tag proposal

dormando dormando at rydia.net
Sat Oct 6 22:36:17 UTC 2007


>     It's a lot of requests rarely.  Broadcast isn't particularly 
> expensive in my client, but I certainly can see how it is for others.
> 
>     It comes down to measurements, I suppose.  If tags help, then it'll 
> be useful.

Heh, hash by tag lookup? :) No wait, multiple tag support... It's also 
no longer atomic when you expire a tag across a cluster. I hope most 
folks use memcached in a cluster. Have no idea what to do about that, 
but it's worth noting.

>> Not saying the feature isn't worth adding; there are doubtless valid 
>> use cases for it. But whatever implementation finally arrives, IMO, 
>> shouldn't impose any per-object memory overhead on objects that have 
>> no tags at all. Or if it does, it should be surrounded by #ifdef so 
>> that sites that don't need it don't see their available cache memory 
>> drop substantially when they upgrade.
> 
>     I was imagining the overhead being something like 8 bytes per item 
> on a 32-bit system as well as the tag hash table.
> 

So what does this mean alloc-wise?

- Tag hash table
- Tag array per item (supporting multiple tags per item, right?)

And thread-lock wise?

- Global version counter.
- Tag counters
- Tag hash table in general, maybe? You could just biglock on this.

Structures in the tag hash should definitely be reusable in a free list, 
like most of the other structures. Uhm, having one or more per key could 
be massive suck if you're storing small items. Otherwise the goal should 
still be to avoid malloc/free if at all possible.

Presize the tag table? Free list the tag name/version structs? Good enough.

Tag array per item? Uck :\ 8 bytes per item for a single tag, then you 
add a second tag and you have to realloc the item header? Or is there 
something more clever that I'm missing? There're currently very few 
malloc's in the code tree, and usually items don't get realloc'ed :) 
They're latent, they suck.

Tag support could probably be a config (not ./configure) option though, 
and avoid that memory overhead.

It's also a good amount of thread locking, again unless someone's more 
clever than I am and has a better idea. It's no worse than the way stats 
are currently handled. So maybe it's not so bad if you don't have a T2000.

-Dormando


More information about the memcached mailing list