Navigating documents with XPath

There are many approaches to navigating XML documents. Some solutions prefer to parse a document into a Document Object Model (DOM); others prefer the Simple API for XML (SAX). Yet others use XML Stylesheet Language Translations (XSLT). Each of these techniques solves the problem of parsing the XML document in a different way. In this Tech Mail, we'll look at the XML Path Language (XPath) and how it can be used within XSLT to navigate an XML document.

XPath basics
XPath is a language for addressing pieces of an XML document. An XML document is made of items such as elements and attributes. Using XPath, you can easily select one or more pieces of a document. The selected elements are identified using an XPath expression. Expressions include:

  • A path to a specific element.
  • A wildcard that selects multiple elements.
  • A function that selects zero or more elements.
For example, the path /CustomerOrder/LineItems/Item[0]/Price selects the price for the first line item in a customer order. This "path" obviously mimics the structure of a file-system. The reason is that most file-systems are sets of structured data (as are XML documents). A "path" is a mechanism that describes how to navigate from one point in the structure (or tree) to another. Each piece of data is a point in the tree, or a point along a particular path.

By example
Consider a simple XML document, as shown in Listing 1. This document illustrates some of the essential information used in creating a customer order that might drive an invoicing system.

Listing 1: simple.xml

<CustomerOrder>
<CustomerInformation>
<CustomerNumber>18823</CustomerNumber>
<Addresses>
<Address type="shipping">
<Customer Name>Widget Corp.</CustomerName>
<Address>1234 Michigan Ave</Address>
<City>Chicago</City>
<State>IL</State>
<Zip>60614</Zip>
</Address>
<Address type="billing">
<Customer Name>Widget Corp.</CustomerName>
<Address>1234 Michigan Ave</Address>
<City>Chicago</City>
<State>IL</State>
<Zip>60614</Zip>
</Address>
</Addresses>
</CustomerInformation>
<OrderInformation>
<OrderNumber>900128844</OrderNumber>
<LineItems>
<Item>
<SKU>19923</SKU>
<Description>Carburating widget</Description>
<Quantity>12</Quantity>
<PricePer>500</PricePer>
<Subtotal>6000</Subtotal>
</Item>
<Item>
<SKU>34888</SKU>
<Description>Ionized filter</Description>
<Quantity>1</Quantity>
<PricePer>48.50</PricePer>
<Subtotal>48.50</Subtotal>
</Item>
<Item>
<SKU>887324</SKU>
<Description>Flange grommet</Description>
<Quantity>50</Quantity>
<PricePer>25</PricePer>
<Subtotal>1250</Subtotal>
</Item>
</LineItems>
<Subtotal>7298.50</Subtotal>
<ShippingAmount>200</ShippingAmount>
<TaxAmount>500</TaxAmount>
<Total>7998.50</Total>
<OrderDate>20010422</OrderDate>
</OrderInformation>
</CustomerOrder>


We'll use this CustomerOrder data to drive our fulfillment operation. The shipping center is using an antiquated system that can only receive data in comma-delimited file format. Using XPath combined with an XML Stylesheet Translation Language (XSLT) template, we will create a file that meets the specifications of the shipping center.

The format for the shipping system is:

OrderNumber, Name, Address, City, State, Zip, SKU, Quantity


Unfortunately, the shipping system is not designed to ship multiple items in the same box. Each line item from the order will be shipped separately. So, we'll need to first think about how we are going to loop through each Item in the XML code to create an output line for the shipping system.

Using the for-each XSLT tag, we can easily grab all of the Items from the XML document. The for-each tag uses XPath in the select attribute to identify the items the loop applies to. In this case, we want all of the Items. The for-each tag to select all of the Items is as follows:

<xsl:for-each select="/CustomerOrder/OrderInformation/LineItems/Item">
</xsl:for-each>


Within this loop, the translation will select each Item element from the /CustomerOrder/OrderInformation/LineItems element. Notice that the format of the XPath expression is similar to the format used when addressing directories in a file system.

Now that we can access each Item, we need to pull the rest of the data from the XML file. The shipping address is a major piece of the output file. We could assume that the shipping address is the first address every time the XML file is sent; however, this may not always be the case. And besides, XPath provides an expression that will ensure the correct address is used.

In order to get the CustomerName, Address, City, State, and Zip for the shipping address, we need to create an XPath query. A query is an advanced type of XPath expression that will select certain data based on criteria provided in the expression. The way to query our XML document to look for the shipping address is to find the Address element whose type is "shipping". The format for our query is: /CustomerOrder/CustomerInformation/Address[@type='shipping']. We can take this query and place it directly into a value statement to access the customer's shipping City:

<xsl:value-of select="/CustomerOrder/CustomerInformation/Address[@type='shipping']/City">


Now we're ready to put all of this together into our final XSLT template, shown in Listing 2.

Listing 2: order2shipping.xsl

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/CustomerOrder">
<xsl:for-each select="/CustomerOrder/OrderInformation/LineItems/Item">
<xsl:value-of select="/CustomerOrder/OrderInformation/OrderNumber"/>,<xsl:value-of select="/CustomerOrder/CustomerInformation/Addresses/Address[@type='shipping']/CustomerName"/>,<xsl:value-of select="/CustomerOrder/CustomerInformation/Addresses/Address[@type='shipping']/Address"/>,<xsl:value-of select="/CustomerOrder/CustomerInformation/Addresses/Address[@type='shipping']/City"/>,<xsl:value-of select="/CustomerOrder/CustomerInformation/Addresses/Address[@type='shipping']/State"/>,<xsl:value-of select="/CustomerOrder/CustomerInformation/Addresses/Address[@type='shipping']/Zip"/>,<xsl:value-of select="SKU"/>,<xsl:value-of select="Quantity"/><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Notice that the SKU and Quantity fields do not use a full-qualified element name. The reason is that these elements are taken from the current context, which has been set by the for-each loop. The output from this transformation is shown in Listing 3.

Listing 3: The comma-delimited file for the shipping system

900128844,Widget Corp.,1234 Michigan Ave,Chicago,IL,60614,19923,12
900128844,Widget Corp.,1234 Michigan Ave,Chicago,IL,60614,34888,1
900128844,Widget Corp.,1234 Michigan Ave,Chicago,IL,60614,887324,50

Summary
The XPath language is a great tool for accessing data in XML documents. It provides a robust interface for querying, looping, and expressing values within XML. In this article, we've illustrated some basic concepts of XPath and shown you a simple example of how to use XPath expressions to access XML data.

Hosted by www.Geocities.ws

1