LJ not correctly parsing <link... > tags.
Martin Atkins
mart at degeneration.co.uk
Fri Jul 1 16:30:35 PDT 2005
Carl Howells wrote:
> It looks like LJ isn't correctly parsing the <link..> tags to extract
> the server URL. In particular, it isn't processing entities in the
> href. Example:
>
> <link rel="openid.server"
> href="http://www.schtuff.com/?action=openid_server" />
>
> That's a valid HTML link tag that uses two entities in its the value of
> its href attribute. Those entities should be processed before the
> attribute is used, meaning the actual value extracted from that
> attribute should be "http://www.schtuff.com/?action=openid_server".
>
> At the moment, LJ's consumer code is not extracting that properly. Will
> that be fixed?
Ha ha ha. I'd never looked very closely at the consumer code until now.
Extracting the URLs with regexes? Come on Brad! Surely you know better
than that?
You can't do HTML with regex. It'll always do something wrong.
The hack solution here would be to use HTML::Entities on the string,
decoding any entities. That'll only fix this problem, though. No doubt
others will arise. The only way to do this properly is with an HTML
parser. Another dependency perhaps, but not an especially harsh one
after it's already depending on some crazy crypto/hash modules which
have to be much more rare than HTML::Parser and friends.
I suppose one final solution which I don't really like to admit the
existence of is to just require that entities and character references
are not used in these attributes. That'll all go horribly wrong as soon
as an ID server URL has an ampersand in it, though.
More information about the yadis
mailing list