XML repositories

With the proliferation of XML as a new common data format, the problem of managing XML documents has become more critical. New technologies are now available that allow organizations to better manage their information as XML documents. In this TechMail, we'll examine the technology of XML repositories and how they help drive the future of extensible shared data.

 

Overview

An XML repository is a system of storing and retrieving XML data. This data is usually in the form of XML documents and their associated Document Type Definitions (DTDs) or XML Schemas. Because XML data lends itself to a hierarchical structure rather than a relational structure, it may be difficult to store XML data in traditional relational database systems. The repository itself may be a relational database system, but it is more likely a custom storage system built exclusively for XML (or hierarchical) data.

 

The method used to store the data will vary depending on the specific system being used. Other variations include the process for storing and retrieving data. Data can be stored and retrieved using a key-based indexing system, and it can also use a query-based retrieval system.

 

Finally, XML repositories may use a variety of access methods. Some systems use a proprietary API based on COM, CORBA, or Enterprise JavaBeans (EJB) while others use a more open ODBC standard. Most repositories provide good support for network access.

 

Storing XML data

The process of storing XML data consists of two different tasks. One task is adding a new XML document to the repository. The other task is updating an existing document. Removing a document from the repository is considered a specialized example of updating an existing document.

 

Because XML data is not based on a traditionally relational model, implementing XML repositories using traditional relational databases can be complex and cumbersome. For example, every level of XML hierarchy requires a new relational table. As your XML documents become more complex, your relational database does as well.

 

Storage systems that are built around a hierarchical model will more easily accept XML data and will do so as native behavior rather than as an adaptation of a relational model. Hierarchical systems also give the added benefit of allowing the use of XQL and XPath expressions for accessing whole and partial documents.

 

Retrieving XML data

The method used to retrieve XML documents is related to the storage method. For relational systems, this will usually be through SQL or stored procedures. These methods have the disadvantage of accessing and returning data as a relational set rather than as an XML hierarchical structure.

 

Hierarchical systems will usually provide an XQL or XPath method for accessing XML data. These technologies more accurately reflect the type of data queries made against XML data. They also provide the data in a hierarchical format.

 

Indexing XML data

When storing data in relational systems, an external primary key may be attached to the XML document for maintaining primary document keys. The data storage and retrieval process uses these keys to identify which document is being stored or retrieved. More advanced systems extract a primary key from an XML element or attribute.

 

Indexes on data stored in relational tables are based on a single table (or single hierarchy level). Hierarchical systems allow you to address a primary key as an element or attribute, as well, but also allow you to create indexes at different levels based on data within the hierarchy.

 

Validating data

One of the most important aspects of XML documents is the option of data validation. Using a variety of technologies, including DTDs and Schemas, XML parsers are able to determine if an XML document meets certain data standards. Because repositories are able to understand a DTD or XML Schema, they can provide validation as data is stored and updated.

 

Summary

As XML documents continue to become more common, organizations will need to create a repository for managing hierarchical data. These repositories will offer new technology for storing, accessing, and optimizing XML documents. Here we've discussed how this new technology is implemented and how it relates to traditional data management systems.

Hosted by www.Geocities.ws

1