tag proposal

dormando dormando at rydia.net
Sat Oct 6 22:21:13 UTC 2007

>> 1) Get a list of all active keys associated with a tag (even if it is 
>> lossy). Two optimizations would be:
>>      * Give me the count of objects within this tag
>>      * For a read, take a snapshot version of the list to have read 
>> consistency (aka take an MVCC approach)
> Tags aren't intended to be an index, just a mass invalidation mechanism. 
>  It's *possible* to use them as an index, but not in any cheap way.
> Without indexing, you'd have to scan all of the stored items and for 
> each one scan your tags and see if there is a match.  Without major 
> changes to memcached, I don't see how you could do that without having 
> the server hang.

If memcached had MVCC support, you could do it without blocking in the 
server and being lossy. All or some item-related structures (maybe just 
tags) could have copy-on-write spaces.

Given a 'get_by_tags' call, you would:

- Chain the tags hash local to the connection. You can do this so it's 
expensive to do the initial chaining, or expensive to delete/invalidate 
tags in the global context, etc.
- Be able to do chunks of work and reschedule as the write buffer fills, 
since you have a consistent view of the tags hash.
- If keeping a count of all objects, in a bulk of cases you can quit the 
search early.

Or do some crazy tags -> item btree index, I guess.

Now, uh. Both are pretty complicated, and issuing just a small handful 
of these requests at once to memcached would add considerable overhead. 
It just doesn't scale as well. 'get_by_tags' would be 10x as resource 
heavy as a 'get'. 30,000 r/s is something mysql can't do already, 3,000 
r/s is actually doable.

>> 2) Allow the "set" of an object with its tag name. This will solve the 
>> problem of creating an object and then tagging the object.
> I don't see how there could be a problem here at all.  It's almost the 
> same amount of information over the wire, but with two commands, you 
> don't clutter up existing concepts.

Just for clarity, you (should?) be able to issue the two commands 
without waiting for the first one to return?


More information about the memcached mailing list