Creating a RESTful Web Service Using ASP.Net MVC Part 16 – XML Encoding

February 21, 2009 01:04 by admin

Jim Solderitsch wrote to me asking for helping making the last version of the web service return XML encoded using UTF-8 rather than UTF-16. He couldn’t load the XML returned from the web service into an XmlDocument without an exception being thrown.

I thought this would be simple, but then I’ve not paid much attention to encoding before!

If you point the browser directly at post 15’s version of the web service you’ll get this message:

The XML page cannot be displayed

Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.


Switch from current encoding to specified encoding not supported. Error processing resource 'http://localhost/ShouldersOfGi...

<?xml version="1.0" encoding="utf-16"?>

If you throw together a console application along the lines of:

namespace RestfulMvcClient

{

    class Program

    {

        static void Main(string[] args)

        {

            XmlDocument xmlDocument = new XmlDocument();

            xmlDocument.Load("http://localhost/…/Products?format=xml");

        }

    }

}

You’ll find, just like Jim, that you get an exception thrown:

There is no Unicode byte order mark. Cannot switch to Unicode.

So what is going on? The XML looks fine! Well, the Internet Explorer error message is complaining about a switch in encoding. If we look at the full response we return it is something along the lines of:

Cache-Control: private
Allow: GET, POST
Content-Type: application/xml; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNetMvc-Version: 1.0
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Fri, 20 Feb 2009 00:45:57 GMT
Content-Length: 1462

<?xml version="1.0" encoding="utf-16"?>
<ProductCollection>
...
</ProductCollection>

The charset and encoding tags are contradicting each other. One is telling the browser that the content is encoded as UTF-8, the other that the XML is encoded as UTF-16. In their book, RESTful Web Services, Leonard Richardson and Sam Ruby recommend that you avoid sending the character encoding as part of the Content-Type header. Charset has been added automatically by ASP.Net, but it is simple to drop by adding the following to the XsltResult class:

context.HttpContext.Response.Charset = "";

Whilst that cleans up the ambiguity in the response, it doesn’t actually fix the problem. The issue is that we are returning a response that is actually encoded using UTF-8 (the default for my system) but claiming it is UTF-16. so we also need to add:

context.HttpContext.Response.ContentEncoding = Encoding.Unicode;

Now we are returning content that is encoded using UTF-16, that also claims to be UTF-16, the xmlDocument.Load call works without error. FireFox works OK. Unfortunately Internet Explorer is now complaining in a slightly different way:

The XML page cannot be displayed

Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.


A name was started with an invalid character. Error processing resource 'http://localhost/ShouldersOfGi...

<

The test harness (which you can load using the URI Products?format=help) also has problems within Internet Explorer, only finding the first character of the response. Out of exasperation I tried lots of things. When I removed the override of the charset property (see above) the test harness started working within Internet Explorer, but browsing to the Products?format=xml URI directly still failed.

At this point I decided to see if I could get UTF-8 working instead. Surely all that was needed was to change the XSLT used in the view to have a tag like this:

<xsl:output method="xml" indent="yes" encoding="utf-8" />

But Jim had already tried this and found that the “utf-8” appeared to be ignored when the transform was applied. He had found an entry on Kirk Evans’ Blog that explained the problem. Basically the transform was being applied using a string builder which forces the encoding back to UTF-16 (because it is writing to a .Net string, which use UTF-16 encoding internally). To get around the problem the transform should write to a memory stream instead.

After implementing the memory stream approach and setting the encoding in the XSLT to UTF-8 all seemed fine. The XML generated by the web service could be consumed by XmlDocument.Load(), by FireFox and by Internet Explorer (both directly and from the test harness). So would the memory stream approach solve the UTF-16 issue? No! No change at all.

I was guessing that the issue was to do with Byte Order Marks (BOM). At the start of a Unicode file, the first few bytes can be used to indicate the type of encoding and the byte order of the characters. For example, if the data starts with 0xFFFE this indicates UTF-16 (little endian) whereas 0xFEFF indicates UTF-16 (big endian). As far as I could tell, this information was missing from the web service response.

From Rick Strahl’s Web Log I got an inkling of the issue. I was converting from the memory stream to a string using the following code:

TextReader textReader = new StreamReader(memStream);

memStream.Seek(0, SeekOrigin.Begin);

return textReader.ReadToEnd();

Whereas I should have been using:

return xmlWriterSettings.Encoding.GetString(memStream.ToArray());

This uses the encoding class itself to perform the conversion. With this magic incantation in place suddenly everything worked! I could even go back to removing the charset tag from the header!

So this latest version of the web service encodes Xml representations correctly, using the encoding specified by the Internal to External XSLT. This download also contains a small console application that loads an XmlDocument directly from a resource: RESTfulMVCWebService16RTM.zip (94.50 kb) (this version has been built against Version 1.0 RTM).

Update: Jonas pointed out an issue when sending XML or XHTML back in to the web service. See his comments below and my response. The link above includes the fix.

An Aside: Above we set the encoding of the response using ContentEncoding = Encoding.Unicode. An alternative way to set the encoding across the entire web service is to use the globalization element in the web.config file:

<system.web>

    <globalization responseEncoding="utf-16"/>

    …

</system.web>

I didn’t rely on this approach as we need to change the encoding as dictated by each XSLT and these may differ from one transform to another. The final version reads the encoding from the compiled XSLT transform. Whilst looking into this I realised that I should include the globalization element anyway… without it, the encoding of the XHTML and JSON representations will use the server machine’s default… which may be different from machine to machine (which could cause no end of trouble).

Another Aside: I wasted lots of time trying to solve the encoding issue by testing against the Guids resource. Every time I put the URI of the Guids?format=xml resource into Internet Explorer’s address bar I would get an error:

Internet Explorer cannot download Guids from localhost.

Internet Explorer was not able to open this Internet site. The requested site is either unavailable or cannot be found. Please try again later.

FireFox didn’t have the same problem. It turns out to be a “feature” of Internet Explorer when “no-cache” requests for a file are made over SSL (even though I’m not using SSL!!!) by typing a URL into the address bar. If you want to understand this, Microsoft’s explanation is a good place to start. The problem does not show up with Products?format=xml because the Products controller does not call:

HttpContext.Response.Cache.SetCacheability(HttpCacheability.NoCache);

From reading the explanation you should glean that clicking on a hyperlink to the document should work. Sure enough, if you create a little test page with a link to the same Guids?format=xml resource, load that page into Internet Explorer and click on the link, the XML from the web service is displayed correctly. Press F5 and the XML refreshes perfectly. Go back to the test page and click the link again, the XML is displayed correctly and it hasn’t been cached. Put the cursor at the end of the URI in the address bar and press RETURN… you are back at the error message. Nice “feature”… especially as it seems to effect me regardless of SSL when talking to localhost!

kick it on DotNetKicks.com


Comments (6) -

February 27. 2009 12:14

Jonas

Glad to see that you are continuing with this series.

With this version however, any attempt to create a product by passing xml or xhtml fails with the error:
The Xml representation was badly formed. Data at the root level is invalid. Line 1, position 1.

Anyone else getting this error?

Regards
Jonas

Jonas

February 27. 2009 15:15

Piers

I'm know what the problem is and will post an updated version very soon... Byte Order Marks are a pain! Basically I have changed the function that applies transforms so that it obeys encodings correctly and hence puts a Byte Order Mark in. This is fine when serving a file, but when applying a transform to create a string for internal use the BOM must not be present.

So when an external representation is recieved and transformed using ItemXmlExternalToInternal.xslt, the BOM must be stripped from the result before it is used internally (e.g. in the deserializer which is where your exception is being thrown).

Thanks for pointing it out!

Piers

March 3. 2009 06:38

Matt Mason

useful thread.  Thanks.

Matt Mason

August 27. 2009 16:34

Miguel Barrientos

I just wanted to thank you for writing a great series of posts about restful web services. The way you explain the options that you considered before settling on the most elegant one is what I like the most about your writing.

I'm looking forward to the next installment. It would be great if you could comment on any changes that you would want to make to your code once ASP.NET MVC v2 is released.

Miguel Barrientos

August 28. 2009 09:28

Piers

Thanks Miguel... I'm currently working on an update that considers a new sample SDK from the WCF team to add REST to MVC. It should be finished in the next week.

Piers

March 17. 2010 13:27

tj

I enjoy reading posts like this. Thanks

tj

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading