Broken HTML Support

Martin Atkins mart at degeneration.co.uk
Sat Feb 11 01:29:15 UTC 2006


Joseph Holsten wrote:
> I'd really like to start collecting best practices to help pin down  how
> the spec is being used, which SHOULDs and recommendations are  being
> ignored, etc. In particular, what's the position about handling  broken
> html, because Josh thinks it should be handled, I think it  doesn't meet
> the spec, and should be ignored.

I've lost track. Do we support "parsing" HTML by regex for YADIS?

OpenID allows (and the reference consumer implementation empoys) regex
for the HTML "parsing" so long as only the head section is considered.
In the regex case it is impossible to reject invalid HTML.

There's not much point in allowing some relying parties to fail on
invalid HTML; either they all must or they all mustn't or else everyone
using invalid HTML will find that their identity works erratically and
not understand why.

So the YADIS spec must make some decision on this. Parsing old-fashioned
HTML is a lot harder since the closing </HEAD> is optional, but at the
same time outright rejecting HTML that isn't well-formed XML locks out a
lot of people. It's a tough one. I'm considering — but haven't yet
decided on — the conclusion that the YADIS spec should just specify the
exact regex a relying party should use, much like the pingback specs do.
That way everyone's parsers are just as lame as everyone else's and
everyone works the same; users will do what they are told because it
won't work otherwise, and everyone is happy.



More information about the yadis mailing list