<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; "><BR><DIV><DIV>On 23 Jan '06, at 8:39 AM, Johannes Ernst wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"><P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Verdana" size="3" style="font: 11.0px Verdana">On Jan 23, 2006, at 8:11, Jens Alfke wrote:</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Verdana; min-height: 13.0px"><BR></P> <BLOCKQUOTE type="cite"><P style="margin: 0.0px 0.0px 0.0px 10.0px"><FONT face="Verdana" size="3" style="font: 11.0px Verdana">I haven't looked into the source code of the various OpenID client implementations; are they smart enough to recognize only real <link> tags, not CDATA content?</FONT></P> </BLOCKQUOTE><P style="margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Verdana; min-height: 13.0px"><BR></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Verdana" size="3" style="font: 11.0px Verdana">For the uninitiated, could you expand on how this would look like?</FONT></P> </BLOCKQUOTE></DIV><BR><DIV>Sure. CDATA sections are kind of like the long-quote syntax in Perl, Ruby, Lua, etc. They allow a document to contain a block of text that will be ignored by the parser. Basically, anything between "<![CDATA[" and "]]>" is treated as raw text, even if it contains metacharacters like "<" or "&". The only character sequence it can't contain is, of course "]]>". It's very useful for escaping stuff like blocks of program code or user-entered text.</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN><A href="http://www.w3schools.com/xml/xml_cdata.asp">http://www.w3schools.com/xml/xml_cdata.asp</A></DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>But this can trip up simple-minded scanners, as defined in the Pingback spec. Finding the string "<link rel=" in an HTML or XML document does not mean you've found a valid <link> tag. It could be literal text in a CDATA section.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Example of a hacked identity page, a blog on which an attacker has posted a comment:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><html></DIV><DIV><head></DIV><DIV><title>Bob's Home Page</title></DIV><DIV><link href="<A href="http://bob.com/openid-server.app">http://bob.com/openid-server.app</A>" rel='openid.server'></DIV><DIV></head></DIV><DIV>...</DIV><DIV><div class="comment"></DIV><DIV><![CDATA[Muahaha!</DIV><DIV><link rel="openid.server" href="<A href="http://evil.net/openid/">http://evil.net/openid/</A>"></DIV><DIV><link rel="openid.delegate" href="<A href="http://evil.net/doctor_evil/">http://evil.net/doctor_evil/</A>">]]></DIV><DIV></div></DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>An OpenID client site using an algorithm like the one specified for Pingback would find the link "tags" (actually plain text) added by doctor_evil instead of bob's actual OpenID link. First it misses the real link because the href attribute came before the rel, and its regexp wasn't expecting that. Then it found the bogus link tags in doctor_evil's comment because it didn't notice they were inside a CDATA block and therefore not actual tags. The result is that the client redirects to evil.net and lets doctor_evil authenticate himself as bob.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Lesson: The link auto-discovery MUST be done by a real HTML/XML parser. It would be good to call that out in the spec so no implementor gets the bright idea of just pasting in some sloppy code that was used for Pingback or RSS autodiscovery.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>--Jens</DIV></BODY></HTML>