IRIs and URIs; or: Internet Explorer does not decode encoded non-ASCII characters in its address bar

by Oliver 24. October 2013 23:03

Some facts about IE and its address bar IE can display non-ASCII characters in the address bar if you put them there by hand or click a link that contains such in unencoded form, e.g.ürther-wassersportclub. IE sends a request for the correctly encoded URL, which is Now, if you're in IE and click on the second link above, IE will not decode the URL back to the unencoded version – it will just keep the encoded URL in the address bar. If, instead, you're reading this page in FF or Chrome, the encoded URL above will be gracefully decoded into its unencoded counterpart. URIs and IRIs Disclaimer First off, let me tell you that I'm by no means an expert in this field. I'm trying to get my around URIs, IRIs, encodings and beautiful web sites and URLs just like probably half of the web developer world out there. So please, verify what you read here and correct me where I am mistaken. What the RFCs have to say By today, more than a handful of RFC documents have been published concerning URIs: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, which is the current Internet Standard (using IETF vocabulary) on URIs it updates RFC 1738, and it obsoletes RFCs 2732, 2396, and 1808 RFC 3987 - Internationalized Resource Identifiers (IRIs), which adds Unicode support for resource identifiers RFC 3986 states the following about a URI: A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3. See the examples section, or refer to Appendix A for the ABNF for URIs. RFC 3987 states the following about an IRI: An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). In short, IRIs may contain Unicode characters while URI must not. Moreover, every URI is a valid IRI and every IRI can be encoded into a valid URI. Let's see an example again: IRI:ürther-wassersportclub URI: A great read on IRIs and their relationship to URIs can be found here by the W3C. Support for IRIs IRIs are not supported in HTTP as per RFC 2616. This implies that before requesting a resource identified by an IRI over HTTP it must be encoded as a URI first. This is what all mainstream browsers seem to do correctly – when you click onürther-wassersportclub and inspect the request sent from your browser you will see that it actually requests HTML5 support IRIs as URLs: Use IRIs today It looks like you can safely use IRIs in your HTML pages today already. And doing so will actually persuade IE into displaying the correct non-ASCII characters. So why don't we?

About Oliver code blog logo I build web applications using ASP.NET and have a passion for javascript. Enjoy MVC 4 and Orchard CMS, and I do TDD whenever I can. I like clean code. Love to spend time with my wife and our children. My profile on Stack Exchange, a network of free, community-driven Q&A sites

About Anton code blog logo I'm a software developer at teamaton. I code in C# and work with MVC, Orchard, SpecFlow, Coypu and NHibernate. I enjoy beach volleyball, board games and Coke.