Securing HTML vs securing HTTP
jens at mooseyard.com
Mon Jan 23 18:39:12 UTC 2006
On 23 Jan '06, at 8:39 AM, Johannes Ernst wrote:
> On Jan 23, 2006, at 8:11, Jens Alfke wrote:
>> I haven't looked into the source code of the various OpenID client
>> implementations; are they smart enough to recognize only real
>> <link> tags, not CDATA content?
> For the uninitiated, could you expand on how this would look like?
Sure. CDATA sections are kind of like the long-quote syntax in Perl,
Ruby, Lua, etc. They allow a document to contain a block of text that
will be ignored by the parser. Basically, anything between "<![CDATA
[" and "]]>" is treated as raw text, even if it contains
metacharacters like "<" or "&". The only character sequence it can't
contain is, of course "]]>". It's very useful for escaping stuff like
blocks of program code or user-entered text.
But this can trip up simple-minded scanners, as defined in the
Pingback spec. Finding the string "<link rel=" in an HTML or XML
document does not mean you've found a valid <link> tag. It could be
literal text in a CDATA section.
Example of a hacked identity page, a blog on which an attacker has
posted a comment:
<title>Bob's Home Page</title>
<link href="http://bob.com/openid-server.app" rel='openid.server'>
<link rel="openid.server" href="http://evil.net/openid/">
<link rel="openid.delegate" href="http://evil.net/doctor_evil/">]]>
An OpenID client site using an algorithm like the one specified for
Pingback would find the link "tags" (actually plain text) added by
doctor_evil instead of bob's actual OpenID link. First it misses the
real link because the href attribute came before the rel, and its
regexp wasn't expecting that. Then it found the bogus link tags in
doctor_evil's comment because it didn't notice they were inside a
CDATA block and therefore not actual tags. The result is that the
client redirects to evil.net and lets doctor_evil authenticate
himself as bob.
Lesson: The link auto-discovery MUST be done by a real HTML/XML
parser. It would be good to call that out in the spec so no
implementor gets the bright idea of just pasting in some sloppy code
that was used for Pingback or RSS autodiscovery.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the yadis