XML  
Application Developers will walk away with useful XML information and coding that they can use on the job. Delivered each Wednesday, XML tips will help subscribers stay ahead of the technology curve.
 

Simple API for XML

Overview
There are numerous types of XML documents and almost as many theories and techniques for parsing them. Some developers have adopted the Document Object Model (DOM) to parse these documents because of its ease of implementation. DOM essentially reads an entire XML document into memory and provides the developer with an interface for working with that document. Unfortunately, when working with large XML documents that contain lots of transactions, DOM can be an inefficient method. In this TechMail, we'll discuss an alternative to DOM called Simple API for XML (SAX) and explain how it can help resolve problems presented by large XML documents.

What's dumb about DOM
It's almost impossible to keep up with all the types of XML documents and models out there. And like most technologies, developers and organizations are always creating inventive ways to use the XML technology to assist with their information management. Some systems that use XML documents to process single records as documents are using the DOM parsing techniques successfully, while other systems use XML to process a batch of transactions. When these batches become large (or more than a megabyte of data), the DOM parser becomes inefficient. Since a great deal of memory has to be allocated for each XML document processed, and—depending on how the system uses the XML data—it may be replicated in memory many times. It doesn't take long for most systems to run low on resources with this type of architecture.

Event-based parsing
The SAX parser is different from DOM in that a SAX parser doesn't read the entire XML document into memory all at once. Instead, SAX parses documents using an event model. As events occur within the XML document, SAX sends them to a handler (which the developer defines) for processing. Events include the start and end of the document, the start and end of each element, and processing instructions.

The handler must implement the org.xml.sax.DocumentHandler interface. This interface defines the actual methods that are called when the XML events occur. By creating a class that implements this interface, each event can be processed as it occurs.

Processing models
The processing model when using SAX is a bit different than for DOM. With the DOM model, you can handle the document as a single event and evaluate the document as a single transaction. But since SAX isn't reading the entire document into memory, it loses the ability to examine the document as a whole (unless every event is stored into memory which is a bad way to implement SAX). That said, it's apparent that SAX may not be appropriate for working with large documents that contain only a single record or transaction.

Depending on the complexity of the XML documents, SAX may be cumbersome to implement. For example, SAX will send a new event for every element in your XML tree. As the depth of the document tree increases, so does the complexity of the SAX implementation. SAX generally works best with documents that contain a number of fairly simple transactions in a single document.

Conclusion
Although the DOM is a useful method for parsing XML documents, it has its shortcomings regarding large documents that contain many transactions. However, SAX provides an event-driven interface for processing XML documents and doesn't use up all of your system resources while doing it.

Hosted by www.Geocities.ws

1