Presents your
XML E-NEWSLETTER for September 25, 2002
------------------------------------------
SEND BINARY DATA IN XML
XML is generally thought of as a method for describing data using text.
For example, elements are given text names, and element contents are
usually text-based. There are times, however, when you want to put data
other
than text into your XML documents. Let's examine some of your options.
THE PROBLEM
You might think that you can just drop some binary data inside a start
and end tag and you're done. Unfortunately, this can lead to several
potential problems:
* New line and space characters in XML will mess up the binary data.
* Binary data may contain null characters.
* Binary data may contain sequences.
These problems affect both the binary data and the XML parsing. If the
parser can't figure out what's going on, then you won't be able to
extract
any data. If the data is "formatted" by the parser, then you won't be
able to process the binary data correctly.
THE SOLUTIONS
There are at least three solutions to this problem:
* Embed the binary data directly in the XML document using the CDATA
tag.
* Refer to the binary data using a URL.
* Encode the binary data into a text-based format that can be set as
the
contents of an XML element.
BINARY EMBEDDING
This solution allows you to put binary data directly into an XML
document. Using this method, you won't have to pull the file from a
remote
source or decode it before using it. The data is available for
immediate
processing.
To employ this method, use the XML CDATA tag, which is a special tag
for
processing data that isn't going to be parsed during XML processing.
Essentially, you use a start and end tag to signify where the binary
data
begins and ends. The value of the element containing the CDATA will be
the
binary data. Here's an example:
99238
Super Gidgetidoo
As you can see, the CDATA tag uses the sequence as an end tag. The XML parser ignores all the data
between the tags.
Unfortunately, this method has some problems. First, you may run into
issues with the character set used by the XML document, parser, and
your
binary data. Second, your binary data may contain the ]]> sequence,
which
would indicate the end of the nonparsed data to the XML parser even
though
it's not the end of the binary data--a very messy situation.
BINARY REFERENCE
Probably the easiest solution is to put the binary file on a network
accessible server and just refer to it by URL. Using the reference
allows
you to not worry about encoding the file or sending large files across
the
network with the XML. It also allows you to dynamically update the file
without having to send a new XML document. Here's an example:
99238
Super Gidgetidoo
http://www.mysupergidgets.com/pictures/99238.jpg
BINARY ENCODING
There are a handful of methods you can use to encode binary data as
text
data. Essentially, the process changes the binary bytes into ASCII
bytes
using a relatively simple algorithm. The two most popular binary
encoding algorithms are UUencode and base64 encoding.
An extended version of binary encoding, called MIME, is used to add
information about the file that's encoded (such as the filename).
Encoding
programs are easy to find as shareware and programming tools. Here's an
example that embeds a binary-encoded file in an XML document:
99238
Super Gidgetidoo
Content-Description: File encoded with ENCODE64.EXE.
Content-Disposition: attachment; filename="foo.zip"
Content-Transfer-Encoding: BASE64
Content-Type: application/octet-stream
UEsDBBQAAAAIAGtaMS2/u6RnIAAAAIYAAAAKAAAAYm
l0bWFwLmJtcHPybWOAADMg1gBiVihm
ZJAAiwcA8RE+CIaB/6iAYj4AUEsBAhQAFAAAAAgAa1oxLb+7
pGcgAAAAhgAAAAoAAAAAAAAA
AAAgALaBAAAAAGJpdG1hcC5ibXBQSwUGAAAAAA
EAAQA4AAAASAAAAAAA
Brian Schaffner is a senior consultant for Fujitsu Consulting. He
provides architecture, design, and development support for Fujitsu's
Telcom360
group.
----------------------------------------