[Repost] Key Dependencies

clayton collie gboyah at hotmail.com
Mon Mar 12 23:02:46 UTC 2007


My apologies for the bungled post (thanks to hotmail)

Anyhow...
I need the following in an ongoing project, so i'm posting here to see if 
others see merit in the idea.

Proposal
--------
Add key-based dependencies to memcached.

Motivation
----------
To simplify common caching scenarios (data dependencies and namespaces)
by centralizing the logic at the server.

Example
---------

Assume we have an organization selling widgets to the Americas
and Asia, and that the sales calculations are aggregated and
cached.

    add ns:americas 0 0 nothing\r\n
    add ns:asia 0 0 nothing\r\n

    add americas:sales 0 0 25000\r\n
    add americas:sales.usa 0 0 20000\r\n
    add americas:sales.usa.bolts 0 0 10000\r\n
    add americas:sales.usa.grommets 0 0 10000\r\n
    add americas:sales.colombia 0 0 5000\r\n
    add americas:sales.colombia.bolts 0 0 5000\r\n

    dependency americas:sales.usa americas:sales.usa.bolts 
americas:sales.usa.grommets\r\n
    dependency americas:sales.combia americas:sales.colombia.bolts\r\n

    dependency americas:sales americas:sales.usa americas:sales.colombia\r\n
    dependency americas:sales.usa ns:americas\r\n
    dependency americas:sales.colombia ns:americas\r\n

Here we add keys representing the two sales regions (ns:america, ns:asia). 
The total sales
per region are stored (e.g. americas:sales), as well as the individual 
country and item sales
which constitute the total. The aggregate is made dependent on the country 
sales, and
the country sales are dependent on the region. Any number of dependencies 
can be added,
but of course this consumes memory per key.

 If the sales figures for the USA or Colombia changes, americas:sales is 
automatically
deleted. To delete the aggregate data for the Americas region, we delete 
ns:americas.

Implementing namespaces in this way may cause issues in that a dependency 
would
be one of the first items scavenged if there is no need to retrieve it
- like ns:americas in this example (there is no useful data associated).
The client can handle this with a multi-key get which always includes such a 
key.
PHP example :

    function GetEx($memcache, $key)
    {
        $pos = strpos($key, ":");    // we use ":" as namespace separator
        if (FALSE == $pos) {
            /* not namespaced, use single get */
            return $memcache->get($key);
        }
        $namespace = "ns:" . substr($key, 0, $pos);
        $arr = $memcache->get( array($key, $namespace) );
        return $arr[$key];
    }


Dependency command
--------------------------

dependency <key> <dependency>*\r\n

- <key> is the key under which the dependent item is stored.

- <dependency>* means one or more key strings separated by whitespace.

Each <dependency> must represent a valid undeleted, unexpired item.
Each item specified by the dependency key has <key> added to its list of 
dependents.
An item cannot be added as a dependency to itself.

After sending the command line and the data block the client awaits
the reply, which may be:

- "OK\r\n", to indicate success.

- "NOT_FOUND\r\n" to indicate the item or one of the dependencies was was 
not found.


Needed Changes
------------------------

- Addition of storage for dependent keys per item struct.

- void item_add_dependencies(conn *c, item *node, item **dependencies, int 
count, int check_cycles)
  Add node as a dependent to each item in the dependencies list. This 
optionally performs a
  topological sort to detect possible dependency cycles.

- Add a server setting to allow/disallow dependency cycle detection.

- Modification of the item_unlink routine to recursively unlink dependents 
of
  the item. When the dependency is deleted or modified, all of its 
dependents
  are deleted recursively.

- Modification of item_free to cleanup dependent key storage.

- An addition to the protocol to allow client setting of item dependencies.


Caveats
-------
This implementation assumes that a key and its dependencies are located in a 
single memcached
server instance. Cycle detection has an effect performance, so it is made 
optional by setting.
Dependents of a key are stored per node and consume (2 + strlen(key)) bytes 
each.


Any thoughts ?

clayton collie 



More information about the memcached mailing list