Multiget/intelligent generic PHP wrapper function... thoughts/advice wanted

mike mike503 at gmail.com
Fri Nov 2 19:33:03 UTC 2007


On 11/2/07, Dustin Sallings <dustin at spy.net> wrote:
>        If you needed something more complicated than a simple string
> prefix, you could pass in a function to do it instead.  The first two
> lines would look like this:
>
> def get_cached(keys, cache_miss_func, key_func=lambda k: k,
> timeout=300):
>        found=memcache.get([key_func(k) for k in keys])
>
>        This one would allow you to perform arbitrary transformations on
> your key when going to the cache.

that adds one array iteration on the interpreted level. my general
rule of thumb is the more that is in compiled code the faster and more
efficient it will be.

>        I'm not generally a fan of early returns, but I don't see much of a
> benefit of short-circuiting in this way.  I'm doing an O(n) pass on
> the input keys against the found keys to compute a key set complement.
>
>        If there's nothing in the complement set, we have everything and
> will return.

Fair enough, I suppose.

>        This is a bit off topic, but I'm pretty sure PHP supports parameter
> binding.  I wouldn't trust any code that didn't use it.
>
>        For example, in your first case, are you *sure* there's no way to
> execute that code with an arbitrary string?  Really?

Well, considering that I formulate the cache set/gets myself, yes. It
will always be key names controlled by myself. Actually the wrapper
functions would be like this:

(modified to be my "ideal" prefix situation)

function user_get($keys) {
   list($hits, $misses) = cache_get('user:', $keys);
   if(count($misses) > 0) {
       $fetched = array();
       $q = db_query("SELECT * FROM users WHERE user_id
IN(".implode(',', $misses).")");
       while($r = db_rows_assoc($q)) {
            $fetched[] = $r;
       }
       db_free($q);
       $hits = array_merge($hits, $fetched);
   }
   return $hits;
}

I believe that is the cleanest you can get on the PHP level, and even
then it requires a couple tweaks to the memcached client.

>        How big are these multi-gets?  Are you sure optimizing out an
> iteration is valuable when you're talking about operations you're
> sending over the network anyway?  Can you even measure the amount of
> time it takes to do one of these?

I look at it this way - for each time (currently) I will hit the cache
it will take 3 array iterations + the network + whatever else the
script is doing. With a prefix addon, I could skip 2 of those
iterations on the PHP level, since the module will handle adding and
removing it before it passes it back to the PHP. Also, if it was done
in method #2 with returning an array of missed keys, that is again one
less array iteration needed, since it would be returning the list of
keys in array format already.

I am not sure, the multi-gets could be 1 key or 100. Or possibly if I
am pre-caching the data ahead of time, I could be sending batches of a
lot more. I want to design this in a manner where I do not care how
many keys are requested...


More information about the memcached mailing list