tag proposal

Mon Oct 8 20:41:43 UTC 2007

On Oct 6, 2007, at 15:36 , dormando wrote:

>>     It's a lot of requests rarely.  Broadcast isn't particularly  
>> expensive in my client, but I certainly can see how it is for others.
>>     It comes down to measurements, I suppose.  If tags help, then  
>> it'll be useful.
>
> Heh, hash by tag lookup? :) No wait, multiple tag support... It's  
> also no longer atomic when you expire a tag across a cluster. I  
> hope most folks use memcached in a cluster. Have no idea what to do  
> about that, but it's worth noting.

	Yes, that is interesting.  It would be hard to have both work  
without getting all of the nodes communicating or including tag  
information on the response and doing the tag validation on the  
client side.

>>     I was imagining the overhead being something like 8 bytes per  
>> item on a 32-bit system as well as the tag hash table.
>
> So what does this mean alloc-wise?
>
> - Tag hash table
> - Tag array per item (supporting multiple tags per item, right?)
>
> And thread-lock wise?
>
> - Global version counter.
> - Tag counters
> - Tag hash table in general, maybe? You could just biglock on this.
>
> Structures in the tag hash should definitely be reusable in a free  
> list, like most of the other structures. Uhm, having one or more  
> per key could be massive suck if you're storing small items.  
> Otherwise the goal should still be to avoid malloc/free if at all  
> possible.
>
> Presize the tag table? Free list the tag name/version structs? Good  
> enough.

	A singly linked list should be sufficient.  The structs could be  
reused, but the actual tags themselves do need to be allocated at  
some point.  I don't know how important that allocation is to avoid.

> Tag array per item? Uck :\ 8 bytes per item for a single tag, then  
> you add a second tag and you have to realloc the item header? Or is  
> there something more clever that I'm missing? There're currently  
> very few malloc's in the code tree, and usually items don't get  
> realloc'ed :) They're latent, they suck.

	I meant 8 bytes per item overhead in general.  The counter and the  
pointer.  Whether the pointer is a linked list or an array, the  
overhead is fixed for non-tag users.

> Tag support could probably be a config (not ./configure) option  
> though, and avoid that memory overhead.

	That feels messy, but it could be made to work.

> It's also a good amount of thread locking, again unless someone's  
> more clever than I am and has a better idea. It's no worse than the  
> way stats are currently handled. So maybe it's not so bad if you  
> don't have a T2000.

	When do we need locks?

	I admit I'm a bit weak when it comes to threading in C, but when  
threading, I'd think the global counter should be something that  
doesn't get cached and where the increment is atomic.

	The hashtable itself can at worst have a lock per bucket.  There's  
an implementation of a lock-free hashtable in java that should be  
possible in C as well.  Perhaps we could just leave this optimization  
to those who need it since the overlap between people who really need  
MT and people who really wants tags seems to be quite small so far.  :)

-- 
Dustin Sallings