Vpaths performance?

Jeremy James jbj at forbidden.co.uk
Thu Feb 21 14:00:59 UTC 2008


dormando wrote:
> Kevin Olson wrote:
>> Has anyone tested Vpaths on a high volume site?
>>
>> I don't see any glaring problems in the code, but the warning at the top,
>> "this has also not been optimized for huge volume sites" seems somewhat
>> ominous.
>>
>> Kevin
>>
> I think we use something similar and it performs okay.
> 
> For gaiaonline I had plugin with the paths hardcoded and did direct
> string matching for the most possible speed. In the grand scheme of
> things that was probably stupid, as my entire perlbal cluster never got
> above 20% CPU usage during extreme peaks.

Likewise, we've been using our own matcher (Urlmatch) on moderately
loaded production systems since late 2006 without issue. It's slightly
less efficient than Vpaths when it comes to matching urls.

Given the chained selectors went in with r739, is it worth removing the
"does not play well with vhosts" comment at the top?

I'm not sure what one would do for a 'huge volume site' to increase
performance (ie. reduce regexps needed to match) - perhaps you could
assume that several of your regexps start with similar prefixes, then
parse them into a tree to short-circuit having to match them all? eg:

##
# Take a particular month from dynamic (it has some auto xmas content)
VPATH /blog/2007/12/ = dynamic_server

# Take other monthly archives from pre-generated server(s)
VPATH /blog/[0-9]{4}/(0[1-9]|1[12])/ = static_server

# But take latest from cgi
VPATH /blog/latest = dynamic_server

VPATH /blog/ = static_server

# Other stuff
VPATH /wiki/.* = wiki_server
VPATH /about/ = static_server
VPATH /images/profile/.* = dynamic_server
VPATH /images/.* = static_server

SET default_service = dynamic_server

##

You could imagine pulling this out into a tree of (I'm sure you get the
idea):

/blog/(.*) =>
  2007/12/ = dynamic_server
  [0-9]{4}/(0[1-9]|1[12])/ = static_server
  latest = dynamic_server
   = static_server
/wiki/.* = wiki_server
/about/ = static_server
/images/(.*) =>
  profile/.* = dynamic_server
  .* = static_server

Now matching something at /images/funny_cat_picture.jpg goes through 5
regexp matches before getting a hit, not 8 (including avoiding some more
complicated ones).

Would this really gain much by automating this, or are you likely to
gain better levels of performance and less headaches by either writing
your own custom plugin or chaining multiple vpaths together? (ie. use
VPATH /blog/.* = blog_selector).


However, if you have particularly complicated and processor intensive
regexps to match on a heavily loaded site, you're doing it wrong.
Probably better to send requests off to the dynamic server by default,
then return a X-REPROXY instead if it could have gone somewhere else.

Best wishes,
Jeremy


More information about the perlbal mailing list