Presents your
XML E-NEWSLETTER for October 23, 2002
<------------------------------------------->
DISCOVER A BETTER METHOD FOR USING CHARACTER DATA IN XSL TEMPLATES
Sometimes when working with XML documents, you come across some data
that doesn't make sense as XML. A common solution to this problem is to
define the data as character data--or CDATA. This instructs the XML parser to
leave the information as is. Unfortunately, this doesn't always work as
expected when you are working with XSL. Let's examine this problem in
more detail and develop an easy solution.
CHARACTER DATA IN XML
In order to understand this problem, let's put it in context. Suppose
you have some data that isn't XML but uses some XML features--like
angle brackets:
There are two ways to specify this data in XML. The first is to use
escape sequences for the angle brackets, like this:
<here is an example>
This solution works fine for small pieces of data; however, you may not
want to escape every angle bracket in a larger set of data. Instead,
you might use an alternate solution:
]]>
Here you use the CDATA escape sequence to instruct the parser that
there is some character data that doesn't conform to XML enclosed. This
allows you to put data that uses many angle brackets (or ampersands) in a
single spot without having to escape each instance. But what happens
when you want to use CDATA sections in an XSL template?
AN XSL EXAMPLE
For our XSL example, we'll start with a simple XML document, shown in
LISTING 1. This document contains just a couple of elements. We want to
use XSL to add an additional element.
Listing 1: nonxmldata.xml
Some generic data
This is some generic data
Our first attempt at adding new data might look like LISTING 2.
Listing 2: nonxmldata1.xsl
does not conform to
standards and therefore is in a CDATA section.
The problem with this approach is that the XSL engine cannot determine
what is happening with the word of in the tag. It looks like an
attribute, but there's value assigned to the attribute called of.
Therefore, our second approach might look like LISTING 3.
Listing 3: nonxmldata2.xsl
does not conform to
standards and therefore is in a CDATA section.
]]>
Here, we've added a CDATA declaration around the non-XML data.
Unfortunately, the data comes back with each bracket individually
escaped. This may not necessarily be a problem, but in our case, we want the
resulting XML to contain a CDATA section, rather than escape sequences.
SOLUTION
To solve the problem, we'll use a little-known attribute of the
element called cdata-section-elements. This attribute
tells the XSL engine that certain elements should be output with CDATA declarations
in order to preserve the character data they contain. Because the data is
not valid XML, it must still be enclosed in a CDATA section; however,
now we'll add the new attribute to the element to see the
final solution, shown in LISTING 4.
Listing 4: nonxmldata.xsl
does not conform to
standards and therefore is in a CDATA section.
]]>
The result of running this template against our source XML is shown in
LISTING 5.
Listing 5: result.xml
Some generic data
This is some generic data
does not conform to
standards and therefore is in a CDATA section.
]]>
Brian Schaffner is a senior consultant for Fujitsu Consulting. He
provides architecture, design, and development support for Fujitsu's
Telcom360 group.
----------------------------------------