Multiget/intelligent generic PHP wrapper function... thoughts/advice wanted

Dustin Sallings dustin at spy.net
Fri Nov 2 17:50:27 UTC 2007


On Nov 2, 2007, at 1:18 , mike wrote:

> This is pretty much what I want to do. Part of what makes it complex
> though is the key prefixes I prepend. How do you handle it if you have
> user IDs as the keys, and then some other IDs? That would work if
> there were namespaces or prefix-aware functions (see below), otherwise
> it looks great on paper and pseudocode but I think there's that one
> big detail that is getting missed (that I have recognized actually
> trying to work a solution out right now)

	I don't think that makes it complex.  How about this revised edition  
(I changed the first two lines):

def get_cached(keys, cache_miss_func, cache_key_prefix="", timeout=300):
        found=memcache.get([cache_key_prefix + str(k) for k in keys])
        missing=[k for k in keys if k not in found]
        if missing:
                found_in_db=cache_miss_func(missing)
                for k,v in found_in_db.iteritems():
                        memcache.set(k, v, timeout)
                found.update(found_in_db)
        return found

	If you needed something more complicated than a simple string  
prefix, you could pass in a function to do it instead.  The first two  
lines would look like this:

def get_cached(keys, cache_miss_func, key_func=lambda k: k,  
timeout=300):
        found=memcache.get([key_func(k) for k in keys])

	This one would allow you to perform arbitrary transformations on  
your key when going to the cache.

>>        1) One multi-GET call.
>                 1b) if(count($returned_from_cache) ==
> count($requested)) { return }

	I'm not generally a fan of early returns, but I don't see much of a  
benefit of short-circuiting in this way.  I'm doing an O(n) pass on  
the input keys against the found keys to compute a key set complement.

	If there's nothing in the complement set, we have everything and  
will return.

	If something is missing, we'll compute it, cache it, combine it into  
our return value, and return.

> #2) return two arrays - hits and misses (with $prefix stripped from
> the above idea)

	Some code has to compute the complement.  It seems like something  
that would be more generally useful than just in a memcached  
interface, but the end result at least makes sense.

> This will allow you to easily do a SELECT * FROM foo WHERE ID
> IN(implode(',', $misses)) for numeric keys, or for string keys (or
> whatever you want quoted)
>
> SELECT * FROM foo WHERE ID IN("'".implode("','", $misses)."'")
>
> (could also use array_walk() and have some callback that checks to see
> if it needs mysql_escape_string or not quick... a quick strpos("'",
> $string) and then escape it - again only for string keys)

	This is a bit off topic, but I'm pretty sure PHP supports parameter  
binding.  I wouldn't trust any code that didn't use it.

	For example, in your first case, are you *sure* there's no way to  
execute that code with an arbitrary string?  Really?

> I don't like this as much because it requires one more array iteration
> on the interpreted level:
>
> foreach($hits as $k => $v) {
>        if($hit === $v) { $needed[] = $k; }
> }

	How big are these multi-gets?  Are you sure optimizing out an  
iteration is valuable when you're talking about operations you're  
sending over the network anyway?  Can you even measure the amount of  
time it takes to do one of these?

> Otherwise, I might try, but I probably would not be writing the  
> most efficient code.

	Efficiency is best achieved by writing the clearest implementation  
of what you want, and then measuring the system as a whole to figure  
out what part is slow.  It's often not what you expect it will be.

-- 
Dustin Sallings




More information about the memcached mailing list