Wednesday, June 18, 2008

DOM,SAX,StAX,TrAX !!!

Difference between the following guys (we are talking about xml here :D ):

1-DOM
2-SAX
3-StAX
4-TrAX

Guess that all of us knows what is DOM and some might know what is SAX but the major don’t know what is StAX and TrAX(most of ppl don’t know what is that)

As expected from the mighty JAVA it supports more things than the humble .NET :D as we can see in .NET 2003 it only supports DOM and don’t support any other

thing than DOM (I worked with .net 2003 I was .net develoepr at 1st but didn’t used 2005 extensivly so am not sure if MS supports it now or no but SAX was not

supported in the 1st palce in .NET )

So lets start with defining each of them :

DOM:
DOM creates XML tree that is represented in the memory (for the document). It provides a very flexible API for creating, reading, updating, and deleting nodes within the tree, in general, an memory representation of the document is required; this is bad if we looked at the performance. This means that the DOM loads all the xml document into the memory so if we have 1 MB of xml then 1 MB of memory will be reserved for this document which is not good choice in large documents but note here that it is good in CRUD operations (we will know why later)

SAX:
SAX is a "push"(take care from this we will see it later) type that provides an event callback interface. There is only a reading. SAX requires that an entire XML document be read. But , it doesn't require the entire document to be held in memory at any point in time. SAX is a very low level, efficient for parsing XML documents. this is a good choice if you have large Documents either with events or without events!!!(SAX is used for processing xml for events meaning that you can find an event in the xml that tells the code snippet that will handle the event to do something)

PUSH!!! So there must be a PULL then!!! Yup that’s right there is PULL and PUSH and the different is next time :p

StAX:
StAX is a "pull" type o. there are a Cursor and an Event Iterator API. There are both reading and writing sides of the API. It is more developer friendly than SAX. StAX doesn't require an entire document to be held in memory. But you won’t need to read the whole document (the best part). Portions can be skipped. This improves the performance of StAx more than SAX.

TrAX:
TrAX is for transforming source documents into result documents using XSLT, rule-based language. A TrAX source document may be created via SAX or DOM. TrAX needs both Java and XSLT skills. Optimizing TrAX takes more time (to transform from one xml into another)


After that we can say that:

1-
SAX and StAX are better in the performance issue and the memory thing
But in case of DOM and TrAX it depends (on how large the document is and on what you are doing)
2-
Dom is the only one that supports CRUD operations

3-
SAX and StAX are forward only but DOM and TrAX can be both ways (forward and backward)

1 comment:

anon_anon said...

you might also want to look at vtd-xml, which is the latest and most advanced XML processing API available

http://vtd-xml.sf.net