Securing HTML vs securing HTTP

Jens Alfke jens at
Mon Jan 23 16:11:44 UTC 2006

On 23 Jan '06, at 12:23 AM, Martin Atkins wrote:

> If you're letting arbitrary people inject arbitrary HTML into your  
> site
> you're already in danger.

Most of my examples didn't involve arbitrary people, rather the  
authors of plug-in software I install in my website. Drupal  
extensions, WordPress plugins, Typo themes, etc. And of course  
installing software that runs on your web server always implies a  
degree of trust; my point was that with an HTML-based identity system  
such as OpenID, that trust now extends to your global identity as well.

So the list of bad things a plug-in could do now includes identity  
theft, which is an order of magnitude nastier than simply defacing or  
erasing my website.

Admittedly on the far-fetched side, but I am trying to be paranoid  
here, which on past evidence isn't a bad idea.

> If you're displaying any user-supplied content on your site you  
> need to
> be running it through an HTML cleaner.

That may not help, frankly. It depends on how good the software (on  
someone else's machine that I don't control) that detects the <link>  
tags is. Extrapolating from other <link>-scraping software, it may  
not be very good. For example, take the Pingback spec <http://>, which looks for a <link  
rel="pingback"...> tag. The spec explicitly tells implementations  
that they don't have to use a real HTML parser, but SHOULD instead:
> search the entity body for the first match of the following regular  
> expression:
> <link rel="pingback" href="([^"]+)" ?/?>

Bad news for CMSs that wrap displayed content in CDATA entities  
rather than escaping every metacharacter -- the above regexp will  
find a pingback in that CDATA, allowing a user of the site to inject  
pingback tags into a page.

Of course in the case of Pingback the damage that could be done is  
minimal. Not so for OpenID. I haven't looked into the source code of  
the various OpenID client implementations; are they smart enough to  
recognize only real <link> tags, not CDATA content?

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the yadis mailing list