7. February 2017 02:19
For much too long, this bug annoyed Camping.Info users with Microsoft's Edge browser. Encoding problem with non-ASCII urls When redirecting the browser to a URL such as https://www.camping.info/österreich/tirol/camping-sölden-19387 via the Location header, Edge would request this URL from our server: https://www.camping.info/Ã¶sterreich/tirol/camping-sÃ¶lden-19387 whereas the correctly encoded version of above URL looks like this: http://www.camping.info/%C3%B6sterreich/tirol/camping-s%C3%B6lden-19387. Here are two bug reports at Microsoft that complain about this exact problem: EDGE: URLs with Unicode characters do not load and become garbled EDGE: Does not cope with UTF Location headers Using a hex editor to look at those weird characters Ã¶ I found out that those actually represent the characters with hex value C3 and B6. So, instead of percent encoding the UTF-8 representation of the character ö, Edge would use the UTF-8 hex values for ö to retrieve the given resource. That's surely a bug. How to fix this behavior The only viable fix I could come up with was to send Edge an encoded Location header value. Strictly speaking, that is how it should be done anyway but all major browsers including IE have learned to understand and correctly load even unencoded UTF-8 urls. And with IE there was always the problem that a once encoded URL would not be shown in its friendly unencoded version but would stay full of those %AB%CD blocks which look rather ugly. That's why for a long time, we've been returning unencoded Location header values in redirects. Well, Edge, you'll get an encoded version then, if you need. In a central spot, where all of our redirects are handled, we now have this:var redirectLocation = IsBrokenEdge(app.Request)
And we determine broken Edge version via its build number knowing that at least 14905 contains an official fix:private Regex _regexEdge = new Regex(@"Edge\/\d+\.(?'build'\d+)", RegexOptions.Compiled);
private bool IsBrokenEdge(HttpRequest request)
var match = _regexEdge.Match(request.UserAgent ?? "");
if (!match.Success) return false;
var buildNumber = int.Parse(match.Groups["build"].Value);
// at least, build number 14905 is fixed
// see: https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/8477507/
return buildNumber < 14905;
Thanks to http://stackoverflow.com/questions/31223702/how-to-detect-new-microsoft-browser-edge-in-asp-net for some hints on user agent detection for Edge.
End of story.
24. October 2013 23:03
Some facts about IE and its address bar IE can display non-ASCII characters in the address bar if you put them there by hand or click a link that contains such in unencoded form, e.g. http://marinas.info/marina/fürther-wassersportclub. IE sends a request for the correctly encoded URL, which is http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub. Now, if you're in IE and click on the second link above, IE will not decode the URL back to the unencoded version – it will just keep the encoded URL in the address bar. If, instead, you're reading this page in FF or Chrome, the encoded URL above will be gracefully decoded into its unencoded counterpart. URIs and IRIs Disclaimer First off, let me tell you that I'm by no means an expert in this field. I'm trying to get my around URIs, IRIs, encodings and beautiful web sites and URLs just like probably half of the web developer world out there. So please, verify what you read here and correct me where I am mistaken. What the RFCs have to say By today, more than a handful of RFC documents have been published concerning URIs: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, which is the current Internet Standard (using IETF vocabulary) on URIs it updates RFC 1738, and it obsoletes RFCs 2732, 2396, and 1808 RFC 3987 - Internationalized Resource Identifiers (IRIs), which adds Unicode support for resource identifiers RFC 3986 states the following about a URI: A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3. See the examples section, or refer to Appendix A for the ABNF for URIs. RFC 3987 states the following about an IRI: An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). In short, IRIs may contain Unicode characters while URI must not. Moreover, every URI is a valid IRI and every IRI can be encoded into a valid URI. Let's see an example again: IRI: http://marinas.info/marina/marina/fürther-wassersportclub URI: http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub A great read on IRIs and their relationship to URIs can be found here by the W3C. Support for IRIs IRIs are not supported in HTTP as per RFC 2616. This implies that before requesting a resource identified by an IRI over HTTP it must be encoded as a URI first. This is what all mainstream browsers seem to do correctly – when you click on http://marinas.info/marina/marina/fürther-wassersportclub and inspect the request sent from your browser you will see that it actually requests http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub. HTML5 support IRIs as URLs: http://www.w3.org/html/wg/drafts/html/CR/infrastructure.html#urls. Use IRIs today It looks like you can safely use IRIs in your HTML pages today already. And doing so will actually persuade IE into displaying the correct non-ASCII characters. So why don't we?