Same URL issue, different problem

Fri Apr 28 07:29:42 UTC 2006

Steven Roussey wrote:
> We are doing everything in UTF-8 for our next few site launches, and want to
> be able to support international domain names and unicoded user names (with
> their own subdomain or url directory). 
> 
> I remember there being issues with browsers being tricked by similar
> *looking* domain names. How do we deal with this when we use URLs as
> identifiers? Surely someone has already dealt with it, so I just need a
> heads up on what to do.
> 

I do remember discussing this previously, though I can't remember if it
was for OpenID or for something else.

The best solution identified was for someone to write a library which,
given a unicode string, will note any unusual combinations of different
ranges of characters. For example, using a Cyrillic character in the
middle of a bunch of Latin characters would be red-flagged. The app can
then display the identity with the "alien" character highlighted.

Not really sure how users are supposed to know what that means, though.

For international domain names, a stricter proposal was made: if the
above described library flags *anything*, render the domain name in its
raw ASCII form rather than in its unicode form. This allows the common
case where the entire domain name is from one alphabet to go through
cleanly, but avoids the issue of similar letter substitution in into
English words.

Of course, you can spell "Вапк" and other words entirely with Cyrillic
letters, so this doesn't solve everything. I understand that browsers
also make use of the user's locale: an browser on an English system will
flag non-latin characters as suspicious, for example. Not so easy to do
on an internationally-available website, though. The best we've got is
guesses based on the Accept-Language header, and that can hardly be
considered reliable.

ΒΥΕ!