Builder
http://builder.com.com
Presents your
XML E-NEWSLETTER for August 14, 2002
<------------------------------------------->
CONTROLLING THE PHP XML PARSER WITH XML PROCESSING INSTRUCTIONS
PHP is a popular Web application tool, and XML is a popular Web data
format. The two working together can provide a rich set of
functionality for
building dynamic Internet applications. With XML, and particularly with
XML parsers, you may need to tweak certain areas.
CASE FOLDING IN PHP
In the case of PHP's XML parsing, one area you may need to tweak is
case
folding. Case folding is essentially a technique where the XML parser
translates all element names to uppercase when sending them to the
element
handler functions. This may not be a big deal for your application, or
it
might be a huge problem depending on how you handle your element names.
Fortunately, PHP provides functionality for turning this feature on or
off. Because the switch to turn the feature on (or off) isn't really
part
of your XML data, it's better served as an XML processing instruction.
XML PROCESSING INSTRUCTIONS
A processing instruction (PI) is a piece of the XML document that
provides information for the XML parser. In other words, it contains
instructions about how to parse the data, rather than actually
containing
any data. You can use PIs to alter the way the parser works. For
example, you
might want to inform the parser that case folding should be turned off.
XML processing instructions have a special format within the XML
document. Here's a familiar processing instruction:
This PI tells the parser that this is an XML document and that it
conforms to the XML 1.0 standard. As you can see, the PI starts with
the
characters, ends with the ?> characters, and contains two
space-separated
pieces of data. The first piece of data is called the target, the
second
the value.
What if we want to use a processing instruction that described whether
case folding should be turned on or off? We could use something like
this:
This creates a new PI with a target "casefolding" and a value of on.
A SIMPLE EXERCISE
Now that we understand what a PI is, let's build a scenario that uses
it. We'll start with a simple HTML form that will let us type in (or
cut
and paste) an XML document, as shown in Listing 1.
Listing 1: xmltest.html
PHP XML Parser Test
Next, we'll need the test.php document referenced by our form. Listing
2
shows the test.php script.
Listing 2: test.php
XML Parser Output
function start_handler($parser, $elementName, $attributes) {
echo "Got element $elementName
\n";
}
function end_handler($parser, $elementName) {
echo "Done with element $elementName
\n";
}
function pi_handler($parser, $target, $data) {
if ($target == "casefolding") {
if ($data == "on") {
xml_parser_set_option ( $parser, XML_OPTION_CASE_FOLDING, 1);
} else {
xml_parser_set_option ( $parser, XML_OPTION_CASE_FOLDING, 0);
}
}
}
$parser = xml_parser_create();
xml_set_element_handler($parser, "start_handler", "end_handler");
xml_set_processing_instruction_handler($parser, "pi_handler" );
xml_parse($parser, trim($_REQUEST["xml"]), true);
?>
In the PHP script, there are only a few things happening. Once you skip
past the function definitions, there are only four lines. First, we
create a new XML parser object. Then, we set the element start and end
handlers (which get called when the start and end tags for each element
are
encountered). Next, we set the handler for our processing instructions.
Finally, we parse the document that will call our handlers.
In the sample document, we have set casefolding on. When you submit the
form to the test.php script, the parser will start processing the XML
data. Upon parsing the processing instruction, the parser will call the
pi_handler function we've defined, and pass to it the information
contained
in the processing instruction. Our pi_handler function will examine the
PI
target to determine if it concerns the casefolding feature. Then it
will
turn the feature on or off depending on the value of the processing
instruction.
RUNNING THE EXAMPLE
If you run the example as-is, you'll get an output similar to the
following:
Got element FOO
Got element BAR
Done with element BAR
Got element BAR
Done with element BAR
Done with element FOO
As you can see, with casefolding turned on (which is the default
setting), all of the element names are converted to uppercase. If you
go back
and change the processing instruction so that casefolding is turned
off,
then you should get an output like this:
Got element Foo
Got element Bar
Done with element Bar
Got element Bar
Done with element Bar
Done with element Foo
Again, you can see that the element names are now processed exactly as
they appear in the XML document without any casefolding.
Brian Schaffner is a senior consultant for Fujitsu Consulting. He
provides architecture, design, and development support for Fujitsu's
Telcom360
group.
----------------------------------------