Securing HTML vs securing HTTP

Jens Alfke jens at
Mon Jan 23 18:39:12 UTC 2006

On 23 Jan '06, at 8:39 AM, Johannes Ernst wrote:

> On Jan 23, 2006, at 8:11, Jens Alfke wrote:
>> I haven't looked into the source code of the various OpenID client  
>> implementations; are they smart enough to recognize only real  
>> <link> tags, not CDATA content?
> For the uninitiated, could you expand on how this would look like?

Sure. CDATA sections are kind of like the long-quote syntax in Perl,  
Ruby, Lua, etc. They allow a document to contain a block of text that  
will be ignored by the parser. Basically, anything between "<![CDATA 
[" and "]]>" is treated as raw text, even if it contains  
metacharacters like "<" or "&". The only character sequence it can't  
contain is, of course "]]>". It's very useful for escaping stuff like  
blocks of program code or user-entered text.

But this can trip up simple-minded scanners, as defined in the  
Pingback spec. Finding the string "<link rel=" in an HTML or XML  
document does not mean you've found a valid <link> tag. It could be  
literal text in a CDATA section.

Example of a hacked identity page, a blog on which an attacker has  
posted a comment:

<title>Bob's Home Page</title>
<link href="" rel='openid.server'>
<div class="comment">
<link rel="openid.server" href="">
<link rel="openid.delegate" href="">]]>

An OpenID client site using an algorithm like the one specified for  
Pingback would find the link "tags" (actually plain text) added by  
doctor_evil instead of bob's actual OpenID link. First it misses the  
real link because the href attribute came before the rel, and its  
regexp wasn't expecting that. Then it found the bogus link tags in  
doctor_evil's comment because it didn't notice they were inside a  
CDATA block and therefore not actual tags. The result is that the  
client redirects to and lets doctor_evil authenticate  
himself as bob.

Lesson: The link auto-discovery MUST be done by a real HTML/XML  
parser. It would be good to call that out in the spec so no  
implementor gets the bright idea of just pasting in some sloppy code  
that was used for Pingback or RSS autodiscovery.

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the yadis mailing list