Multiget/intelligent generic PHP wrapper function... thoughts/advice wanted

mike mike503 at gmail.com
Fri Nov 2 08:18:01 UTC 2007


On 11/2/07, Dustin Sallings <dustin at spy.net> wrote:
>        I'll admit I don't know a lot of PHP, but I'd imagine a function that
> looked something like this (I typed this python in my mail client, so
> I don't know that it actually works):
>
> def get_cached(keys, cache_miss_func, timeout=300):
>        found=memcache.get(keys)
>        missing=[k for k in keys if k not in found]
>        if missing:
>                found_in_db=cache_miss_func(missing)
>                for k,v in found_in_db.iteritems():
>                        memcache.set(k, v, timeout)
>                found.update(found_in_db)
>        return found

This is pretty much what I want to do. Part of what makes it complex
though is the key prefixes I prepend. How do you handle it if you have
user IDs as the keys, and then some other IDs? That would work if
there were namespaces or prefix-aware functions (see below), otherwise
it looks great on paper and pseudocode but I think there's that one
big detail that is getting missed (that I have recognized actually
trying to work a solution out right now)

>        1) One multi-GET call.
                1b) if(count($returned_from_cache) ==
count($requested)) { return }
>        2) One SQL query for the misses.
>        3) Two memcached sets for the missing records.

it's funny you wrote this email when you did, I was coming to my
computer to try to scribble down some pseudocode/notes about this very
subject. That is the ideal three step setup I am aiming for above,
with one minor change adding in step 1b - thanks to Brian I believe
for pointing out the obvious, no need to do additional processing if
the cache had every item.

Anyway, I was thinking about this a few minutes ago in the shower...
the place where all great (or crazy?) ideas come from.

I think what I am looking for could actually be accomplished by a
couple minor tweaks to the memcache client itself.

*** PSEUDO CODE ALERT ***

#1) First, add in a "key prefix" parameter. this string (or whatever)
will be prepended to the keys requested prior to being fed to
memcached; on the way back, it will be stripped (leaving no need for
the interpreted PHP level to assemble and de-assemble the key names,
which I believe is only workable by rebuilding a new array item by
item)

#2 and #3 are actually different methods of accomplishing the same
thing. my favorite (I think) is #2....

#2) return two arrays - hits and misses (with $prefix stripped from
the above idea)

list($hits, $misses) = memcache_get($keys, $prefix) ...

This will allow you to easily do a SELECT * FROM foo WHERE ID
IN(implode(',', $misses)) for numeric keys, or for string keys (or
whatever you want quoted)

SELECT * FROM foo WHERE ID IN("'".implode("','", $misses)."'")

(could also use array_walk() and have some callback that checks to see
if it needs mysql_escape_string or not quick... a quick strpos("'",
$string) and then escape it - again only for string keys)

Then just array_combine($hits, $dbhits)

#3) or add a parameter in the get function of what to fill in on a
cache miss. could be anything - i say a parameter since that allows it
to be decided by the end user. a generic "false" might actually be a
legitimate cache hit, so we can't just blindly do that.

I don't like this as much because it requires one more array iteration
on the interpreted level:

foreach($hits as $k => $v) {
       if($hit === $v) { $needed[] = $k; }
}

... do the db call for $needed here, combine the two arrays again ...

I'm thinking moving some of the logic into the module tier would speed
up things quite a bit. I could be over-engineering this though. Does
anyone have any feedback? If it sounds like a stupid idea, I won't
even bother trying to hack the module source. Otherwise, I might try,
but I probably would not be writing the most efficient code. If it
sounds like a quite sane idea, I'd be willing to pay someone who could
produce something properly efficient and reusable and push it to the
actual PECL module itself...

Thanks!


More information about the memcached mailing list