Using Non-ASCII / Unicode URLs on Your Web Site

by Oliver 24. August 2012 16:22

We’re still working on and were wondering if we should change any of the behavior that we use in There we allow for all kinds of unicode characters, from those in Eastern European languages such as Polish to the Cyrillic letters of the Russian alphabet – but we encode them properly using HttpServerUtility.UrlPathEncode(). This is the behavior that RFC 3986 on URIs defines in section 2.1. It also means that all links that are rendered on pages on are correctly encoded and will work in all browsers, even text based ones.

Problems with Internet Explorer

The drawback of encoded URLs is that Internet Explorer will not decode them in the address the way all other major browsers do (Firefox, Chrome, Opera, haven’t checked Safari), so all those Internet Explorer users out there will see something like – which is, to put it mildly, unreadable. Go ahead and copy that URL into any of the other browsers and they will reformat it tośląskie/camping-pod-dębowcem-20149/położenie but behind the scene will still use the encoded URL.

Another problem occurs when you enter a URL that contains non-ASCII characters directly into IE’s address bar. The initial page load will succeed as IE properly encodes the URL. But once you want to POST to the same page IE changes its behavior to something in my eyes inconsistent generating an error.

A deeper look behind the IE scenes

This is what the error looks like in Fiddler:


As you can see, IE replaced the Polish special characters from the first three URL segments and encoded only the last part which is also what’s inside the form’s action attribute:


Who would have thought! As you can see in the first screenshot, for the URLösterreich/niederösterreich something similar happens with the difference that IE replaces the ö by the byte with value F6 in hex or 246 in decimal. To make the following screenshots I saved the whole request (as Fiddler intercepted it) to a file and looked at it using the hex editor HxD:


According to the table, F6 is the Unicode code point for the small letter ö, whose UTF-8 representation is C3 B6 which we can find e.g. in the referrer of the same request:


So it turns out IE uses 3(!) different encodings to transmit the same letter ö: its Unicode code point, the URL encoded version proposed by RFC 3986, and the UTF-8 encoded version. Wow! Unfortunately, IIS and our application don’t play well with that.

Conclusion - Support Encoded URLs anyway

We decided anyway to support those encoded URLs for our new portals including to be able to SEO our pages according to their content even through their URL. Maybe IE 10 will decode those URLs in the address bar and get a grip on handling URL and form action uniformly – for their users’ sake!

Happy encoding!

Add comment

  Country flag

  • Comment
  • Preview

About Oliver code blog logo I build web applications using ASP.NET and have a passion for javascript. Enjoy MVC 4 and Orchard CMS, and I do TDD whenever I can. I like clean code. Love to spend time with my wife and our children. My profile on Stack Exchange, a network of free, community-driven Q&A sites

About Anton code blog logo I'm a software developer at teamaton. I code in C# and work with MVC, Orchard, SpecFlow, Coypu and NHibernate. I enjoy beach volleyball, board games and Coke.