Disk DOM

I have written a java implementation of the DOM specification that stores its parsed-node data in a disk file. Most DOM implementations store their parsed-node data in RAM, meaning the available RAM in the computer limits the size of the XML file it can create or parse. This disk DOM stores its parsed-node data to disk, meaning you can create or parse really large XML files and not eat up all your RAM. 

"What good is that?" I hear you say. Well in my case it means I can run XSLT scripts to transform 20MB XML files instead of having to write custom one-off programs break the source XML file into smaller chunks or use SAX to process the data.

This disk DOM implements all the classes and interfaces in the org.w3c.dom package, and the DocumentBuilder and DocumentBuilderFactory classes in javax.xml.parsers. It passes all the tests in the test suite at www.w3c.org, so in theory it should be do most things a DOM implementation does.

It is totally free - you can download the executable jar and the source code and do with it what you want. If you have any problems or  discover bugs then e-mail me on [email protected].

 

Downloads

Download the jar file only (305k)

Read the manual (20k html)

Download the disk DOM jar file, source code and manual here (1123K)

 

Known problems

The API documentation is not complete. I apologise for this and will correct it

The DOM2 specific functionality needs to be thoroughly tested. Although I have implemented the DOM level 2 methods I know many of the rules in the DOM2 specification are not being enforced. I will get onto this as a matter of urgency.

The naming convention for classes and interfaces is bad. I started naming my own interface extensions to the Node, Element, Text etc interfaces as MyNode, MyElement etc. I then made classes that implement the "My" interfaces as DefaultMyNode, DefaultMyElement etc. But then later on I have created some classes that start with "My" instead of "DefaultMy". I need to give every class an underlying interface "xxx", then have classes called "xxxImpl". Yes this is obvious, but I have been writing this part time so slipped a bit here and there. 

 

Changes

27/4/2003 Cleaned up some obvious formatting problems in the manual.

Hosted by www.Geocities.ws

1