tag proposal

Thu Oct 4 13:25:50 UTC 2007

I would like to illustrate what i think is the principal use of tags:

Currently memcached is ideal object level storage. An blog with 100s
of articles will wind up with a cached entry for each article under a
key such as article:1, article:2 and so on.

Caching collections however is a lot harder, but its also where the
most performance help is needed. Currently the best way to cache a
collection of items ( such as all posts for a given day ) is to store
a key such as "articles:2007-10-04" with the content being an list of
keys of articles. Once aquired the application issues a multi_get and
fill in the blanks with the help of the DB.

You cannot cache a collection of articles in a single key straight. In
such cases as there is no way to properly expire such a collection.
Should an article contained in a cached collection be updated by the
user the cached collection would not be informed of the change.

Once tagging is available however collections can be cached directly.
Collections will be tagged with the key for each article contained.
Very complex queries can now be simply cached dumped into memcached
because they can be invalidated by a simple command.
Simply override the cached key for the article by re-setting it and
issue a tag invalidation on the same key:

Cache.set('article:1', a)
Cache.invalidate_tag('article:1')

This also means that the number of tags in the system will be quite
large. There will be one or more tags for each row in the articles
table. I expect the amount of tags to be vastly larger then the amount
of keys in future memcached servers.

> On 10/4/07, Dustin Sallings <dustin at spy.net> wrote:
>
>
> On Oct 4, 2007, at 1:07, BUSTARRET, Jean-francois wrote:
>
> It would be nice to be able to add multiple tags with a single commands :
> add_tag(key, tag1, tag2, ..., tagN), to avoid multiple roudtrips, or add a
> tag list as an optionnal parameter to the set/add/replace commands. ie : set
> <key> <flags> <exptime> <bytes> <tag1>,<tag2>,<tag3>...\r\n
>
>  I don't think there's a problem with multi-tag.  I'd expect people would
> want to have more than one anyway.
>
>  Adding tags to all of the existing commands is a tough one, though.  The
> cost of one more round trip is trivial compared to having to change the
> server processing of mutation commands and updating all of the client APIs
> to be able to handle this.  In a particular client, you can still send both
> commands as a single stream and then just expect two lines of response.
>
> AFAIAC, I think tags need to be released. My use for tags would be to tag
> cache entries by content id/user id/channel id. Having to store every id
> used (even deleted content/content not accessed for a long time) since the
> last start of memcached would be a problem...
>
>  OK, it's worth considering, then.
>
>
>  Each tag will have a reference count that is increment every time the tag
> is successfully added to an item (i.e. must not increment if an item lookup
> fails or the item already has the tag).
>
>  When the item is deallocated, all tags should be decremented for that item.
>
>  When the tags reference count hits zero, we'll pull it from the global map.
>
>  Memory churn may be an issue.  Someone who knows allocators better than I
> do can decide what to do here.
>
> --
> Dustin Sallings
>
>

-- 
Tobi
http://shopify.com       - modern e-commerce software
http://typo.leetsoft.com - Open source weblog engine
http://blog.leetsoft.com - Technical weblog