IRIs and URIs; or: Internet Explorer does not decode encoded non-ASCII characters in its address bar

by Oliver 24. October 2013 23:03

Some facts about IE and its address bar

IE can display non-ASCII characters in the address bar if you put them there by hand or click a link that contains such in unencoded form, e.g. http://marinas.info/marina/fürther-wassersportclub.

IE sends a request for the correctly encoded URL, which is http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub.

Now, if you're in IE and click on the second link above, IE will not decode the URL back to the unencoded version – it will just keep the encoded URL in the address bar. If, instead, you're reading this page in FF or Chrome, the encoded URL above will be gracefully decoded into its unencoded counterpart.

URIs and IRIs

Disclaimer

First off, let me tell you that I'm by no means an expert in this field. I'm trying to get my around URIs, IRIs, encodings and beautiful web sites and URLs just like probably half of the web developer world out there. So please, verify what you read here and correct me where I am mistaken.

What the RFCs have to say

By today, more than a handful of RFC documents have been published concerning URIs:

RFC 3986 states the following about a URI:

A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3.

See the examples section, or refer to Appendix A for the ABNF for URIs.

RFC 3987 states the following about an IRI:

An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646).

In short, IRIs may contain Unicode characters while URI must not. Moreover, every URI is a valid IRI and every IRI can be encoded into a valid URI. Let's see an example again:

A great read on IRIs and their relationship to URIs can be found here by the W3C.

Support for IRIs

IRIs are not supported in HTTP as per RFC 2616. This implies that before requesting a resource identified by an IRI over HTTP it must be encoded as a URI first. This is what all mainstream browsers seem to do correctly – when you click on http://marinas.info/marina/marina/fürther-wassersportclub and inspect the request sent from your browser you will see that it actually requests http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub.

HTML5 support IRIs as URLs: http://www.w3.org/html/wg/drafts/html/CR/infrastructure.html#urls.

Use IRIs today

It looks like you can safely use IRIs in your HTML pages today already. And doing so will actually persuade IE into displaying the correct non-ASCII characters. So why don't we?

Comments are closed

About Oliver

shades-of-orange.com code blog logo I build web applications using ASP.NET and have a passion for javascript. Enjoy MVC 4 and Orchard CMS, and I do TDD whenever I can. I like clean code. Love to spend time with my wife and our children. My profile on Stack Exchange, a network of free, community-driven Q&A sites

About Anton

shades-of-orange.com code blog logo I'm a software developer at teamaton. I code in C# and work with MVC, Orchard, SpecFlow, Coypu and NHibernate. I enjoy beach volleyball, board games and Coke.