tag proposal

dormando dormando at rydia.net
Sat Oct 6 22:21:13 UTC 2007


>> 1) Get a list of all active keys associated with a tag (even if it is 
>> lossy). Two optimizations would be:
>>      * Give me the count of objects within this tag
>>      * For a read, take a snapshot version of the list to have read 
>> consistency (aka take an MVCC approach)
> 
> Tags aren't intended to be an index, just a mass invalidation mechanism. 
>  It's *possible* to use them as an index, but not in any cheap way.
> 
> Without indexing, you'd have to scan all of the stored items and for 
> each one scan your tags and see if there is a match.  Without major 
> changes to memcached, I don't see how you could do that without having 
> the server hang.

If memcached had MVCC support, you could do it without blocking in the 
server and being lossy. All or some item-related structures (maybe just 
tags) could have copy-on-write spaces.

Given a 'get_by_tags' call, you would:

- Chain the tags hash local to the connection. You can do this so it's 
expensive to do the initial chaining, or expensive to delete/invalidate 
tags in the global context, etc.
- Be able to do chunks of work and reschedule as the write buffer fills, 
since you have a consistent view of the tags hash.
- If keeping a count of all objects, in a bulk of cases you can quit the 
search early.

Or do some crazy tags -> item btree index, I guess.

Now, uh. Both are pretty complicated, and issuing just a small handful 
of these requests at once to memcached would add considerable overhead. 
It just doesn't scale as well. 'get_by_tags' would be 10x as resource 
heavy as a 'get'. 30,000 r/s is something mysql can't do already, 3,000 
r/s is actually doable.

>> 2) Allow the "set" of an object with its tag name. This will solve the 
>> problem of creating an object and then tagging the object.
> 
> I don't see how there could be a problem here at all.  It's almost the 
> same amount of information over the wire, but with two commands, you 
> don't clutter up existing concepts.

Just for clarity, you (should?) be able to issue the two commands 
without waiting for the first one to return?

-Dormando


More information about the memcached mailing list