tag proposal
dormando
dormando at rydia.net
Sat Oct 6 22:21:13 UTC 2007
>> 1) Get a list of all active keys associated with a tag (even if it is
>> lossy). Two optimizations would be:
>> * Give me the count of objects within this tag
>> * For a read, take a snapshot version of the list to have read
>> consistency (aka take an MVCC approach)
>
> Tags aren't intended to be an index, just a mass invalidation mechanism.
> It's *possible* to use them as an index, but not in any cheap way.
>
> Without indexing, you'd have to scan all of the stored items and for
> each one scan your tags and see if there is a match. Without major
> changes to memcached, I don't see how you could do that without having
> the server hang.
If memcached had MVCC support, you could do it without blocking in the
server and being lossy. All or some item-related structures (maybe just
tags) could have copy-on-write spaces.
Given a 'get_by_tags' call, you would:
- Chain the tags hash local to the connection. You can do this so it's
expensive to do the initial chaining, or expensive to delete/invalidate
tags in the global context, etc.
- Be able to do chunks of work and reschedule as the write buffer fills,
since you have a consistent view of the tags hash.
- If keeping a count of all objects, in a bulk of cases you can quit the
search early.
Or do some crazy tags -> item btree index, I guess.
Now, uh. Both are pretty complicated, and issuing just a small handful
of these requests at once to memcached would add considerable overhead.
It just doesn't scale as well. 'get_by_tags' would be 10x as resource
heavy as a 'get'. 30,000 r/s is something mysql can't do already, 3,000
r/s is actually doable.
>> 2) Allow the "set" of an object with its tag name. This will solve the
>> problem of creating an object and then tagging the object.
>
> I don't see how there could be a problem here at all. It's almost the
> same amount of information over the wire, but with two commands, you
> don't clutter up existing concepts.
Just for clarity, you (should?) be able to issue the two commands
without waiting for the first one to return?
-Dormando
More information about the memcached
mailing list