Presents your XML E-NEWSLETTER for July 2, 2003 <-------------------------------------------> RESOLVING CUSTOM XML ENTITIES WITH SAX AND JAVA You may find that your XML documents and their various grammars take on a vernacular of their own. As this vernacular finds its way into your documents, you might need to describe it in an abstract and reusable way. One solution is to use custom XML entities to represent common, reusable XML components. XML ENTITIES Entity is a rather ambiguous term, and it's used in seemingly vague application to refer to a specific category of artifacts found in XML documents. An entity is a piece of named data found within an XML document. There are several entities you may already be familiar with, such as & (which is an entity that represents an ampersand). This entity causes all sorts of problems when it isn't used properly. PARSING ENTITIES There are three ways to categorize XML entities, as shown in TABLE A. Entities are referred to within XML documents using "entity references," which is basically an ampersand (&), followed by the entity name, followed by a semi-colon (;). In the above example, we cited & as an entity--the name of the entity is amp, and the data it refers to is an ampers and character. Table A: Entity categorizations Parsed Unparsed Internal External General Parameter The XML parser will try to resolve entity references to their replacement text. Part of this process uses the DTD to find the definitions for internal parsed entities. It's possible that your entities may not be defined but simply referenced in the DTD; in which case, they are external entities. As SAX engines parse your documents, they may need to resolve these external entities, and you can intercept this process using the EntityResolver interface. ENTITYRESOLVER The EntityResolver interface is remarkably simple and easy to use. The interface consists of a single method called resolveEntity, which takes two parameters: the public identifier and the system identifier for the entity. These identifiers are supplied by the entity definition in the DTD, as shown in our example in LISTING A. Listing A: entity.dtd Our sample XML document is shown in LISTING B. This document illustrates the use of the DTD for validation and shows the MyCustomEntity being used as the value for the CustomEntity element. Listing B: entity.xml &MyCustomEntity; In order to process this entity using our custom resolver, we'll need to code a SAX parser, a handler for the SAX parser, and an EntityResolver. The EntityResolver class is shown in LISTING C. Listing C: CustomResolver.java import java.io.StringReader; import org.xml.sax.EntityResolver; import org.xml.sax.InputSource; public class CustomResolver implements EntityResolver { public InputSource resolveEntity (String publicId, String systemId) { StringReader strReader = new StringReader("This is a custom entity"); if (systemId.equals("http://www.builder.com/xml/entities/MyCus tomEntity")) { System.out.println("Resolving entity: " + publicId); return new InputSource(strReader); } else { return null; } } } The EntityResolver interface is quite simple. The resolveEntity method simply looks at the supplied public and system identifiers and returns an InputSource that points to the value for the entity. Using an InputSource allows you to provide a simple string value via StringReader (as we've done), or to use something more sophisticated. Our handler is called MySAXHandler and is shown in LISTING D. LISTING E shows our example run program called EntityResolverExample, which also implements our SAX parser via the XMLReader interface. We've dramatically simplified the SAX handler; it contains a bare-bones implementation of the ContentHandler interface that will only show the start and stop of each element and the associated character data. Listing D: MySAXHandler.java import org.xml.sax.*; import java.io.*; public class MySAXHandler implements ContentHandler { public void setDocumentLocator(Locator locator) {} public void startDocument() throws SAXException {} public void endDocument() throws SAXException {} public void startPrefixMapping(String prefix, String uri) throws SAXException {} public void endPrefixMapping(String prefix) throws SAXException {} public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException { System.out.println("Starting element: " + localName); } public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println("Ending element: " + localName); } public void characters(char[] text, int start, int length) throws SAXException { String data = new String(text); System.out.println(data.substring(start, start + length)); } public void ignorableWhitespace(char[] text, int start, int length) throws SAXException {} public void processingInstruction(String target, String data) throws SAXException {} public void skippedEntity(String name) throws SAXException {} } Listing E: EntityResolverExample.java import org.xml.sax.*; import org.xml.sax.helpers.*; public class EntityResolverExample { public static void main (String[] args) { XMLReader parser; MySAXHandler msh; CustomResolver myResolver = new CustomResolver(); try { parser = XMLReaderFactory.createXMLReader(); msh = new MySAXHandler(); parser.setContentHandler(msh); parser.setEntityResolver(myResolver); parser.parse("entity.xml"); } catch (Exception e) { System.out.println (e); } } } The EntityResolverExample class uses the XMLReaderFactory to create a new SAX parser using the XMLReader interface. We then set the content handler to our custom content handler and the entity resolver to our custom entity resolver. Finally, we parse the XML document and see the names of the elements and the value for our externally resolved entity, as shown below: Starting element: Entity Starting element: CustomEntity Resolving entity: -//Builder.com//TEXT MyCustomEntity//EN This is a custom entity Ending element: CustomEntity Ending element: Entity Brian Schaffner is an associate director for Fujitsu Consulting. He provides architecture, design, and development support for Fujitsu's Technology Consulting practice. ----------------------------------------