XML, Web Services and Special Characters

If you’ve worked with XML technologies for any reasonable amount of time, you’re aware of the considerations when dealing with “special” characters. This recently came up at work, so I thought I’d share a few quick thoughts.

One of the developers was doing an HTTP post of XML content to a .NET web service. However, we discovered that a few of the records coming across had invalid characters.

Now you probably know that the following message is considered invalid XML:

<Person>
	<Name>Richard</Name>
	<Nickname>Thunder & Lightning</Nickname>
</Person>

The ampersand (“&”) isn’t allowed within a node’s text. Neither are “<“, “>” and a few others. Now if you call a web service by first doing an “Add Web Reference” in Visual Studio.NET, you are using a proxy class that covers up all the XML/SOAP stuff going on underneath. The proxy class (Reference.cs) inherits System.Web.Services.Protocols.SoapHttpClientProtocol, which you can see (using Reflector) takes care of proper serialization using the XmlWriter object. So setting my web service parameters like so …

When this actually goes across the wire to my web service, the payload has been appropriate encoded and the ampersand has been replaced …

However, if I decided to do my own HTTP post to the service and bypass a proxy, this is NOT the way to do it ..

HttpWebRequest webRequest = 
   (HttpWebRequest)HttpWebRequest.Create("http://localhost/bl/sv.asmx");
webRequest.Method = "POST";
webRequest.ContentType = "text/xml";

using (Stream reqStream = webRequest.GetRequestStream())
{

  string body = "<soap:Envelope xmlns:soap="+
  "\"http://schemas.xmlsoap.org/soap/envelope/\">"+
  "<soap:Body><Operation_1 xmlns=\"http://tempuri.org/\">" +
  "<ns0:Person xmlns:ns0=\"http://testnamespace\">" +
  "<ns0:Name>Richard & Amy</ns0:Name>" +
  "<ns0:Age>10</ns0:Age>" +
   "<ns0:Address>411 Broad Street</ns0:Address>" +
  "</ns0:Person>" +
  "</Operation_1></soap:Body></soap:Envelope>";

    byte[] bodyBytes = Encoding.UTF8.GetBytes(body);
    reqStream.Write(bodyBytes, 0, bodyBytes.Length);

}
HttpWebResponse webResponse = 
   (HttpWebResponse)webRequest.GetResponse();
MessageBox.Show("submitted, " + webResponse.StatusCode);

webResponse.Close();

Why is this bad? This may work for most scenarios, but in the case above, I have a special character (“&”) that is about to go unmolested across the wire …

Instead, the code above should be augmented to use an XmlTextWriter to build up the XML payload. These types of errors are such a freakin’ pain to debug since no errors actually get thrown when the receiving service fails to serialize the bad XML into a .NET object. In a BizTalk world, this means no SOAP exception to the caller, no suspended message, no error in the Event Log. Virtually no trace (outside of the IIS logs). Not good.

BizTalk itself doesn’t like poorly constructed XML either. The XmlReceive pipeline, in addition to “typing” the message (http://namespace#root) also parses the message. So while everyone says that the default XmlReceive pipeline doesn’t validate the structure (meaning XSD structure) of the message, it DOES validate the XML structure of the message. Keep that in mind. If I try to pass an invalid XML document (special characters, unclosed tags) that WILL bomb out in the pipeline layer.

If you try to cheat, and do pass-through pipelines and use XmlDocument as your initial orchestration message (thus bypassing any peeking at the message by BizTalk), you will still receive errors when you try to interact with the message later on. If you set the XmlDocument to the actual message variable in the orchestration, the message gets parsed at that time and fails if the structure is invalid.

So, this is probably elementary for you smart people, but it’s one of those little things that you might forget about. Be careful about generating XML content via string building and instead consider using XmlDocuments or XmlWriters to make sure that your content passes XML parsing rules.

Technorati Tags: ,

Author: Richard Seroter

Richard Seroter is Director of Developer Relations and Outbound Product Management at Google Cloud. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Director of Developer Relations and Outbound Product Management, Richard leads an organization of Google Cloud developer advocates, engineers, platform builders, and outbound product managers that help customers find success in their cloud journey. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

2 thoughts

  1. Good post. Anything working with XML assumes it is well-formed, which unfortunately is still not always the case. Hopefully after reading this post you’ll decrement the number of stragglers that are still doing silly things like: myString = myString.replaceAll(“\\<“, “&”); (Java)

  2. This post brings up an old interesting issue I faced recently.
    When I read a text file having special characters and try to write it to a SQL table column using SQL Adapter, it bombs becos of special characters in the updategram or in the insert statement.

    Do you have any idea on how to overcome this problem

    Thanks in Advance
    Ashith Raj

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.