Java and XML

This article is kind of like "Meta Research". I'm not going to tell you how to process XML in Java, I'm just going to point you at some other cool tutorials which do. These are all by Lars Vogel. I strongly recommend his training material for its clarity.

Since the dawn of time (well since I started to process XML) there have been two styles of loading XML, and one main style of writing it. You either loaded up the whole file into memory (through "DOM" - the Document Object Model), or if you were fancy, or worried about running out of memory, you used SAX - the amusingly named "Simple API for XML".

Well I was interested to read Lars Vogel say "Both DOM and Sax are older API's and I recommend not to use them anymore.". He is of course saying that a number of techniques introduced in JDK 1.5 and 1.6 are now better than the old ways.

StaX

The first of these is StaX - The Streaming API for XML. In my head it is very similar to SAX in that it parses the XML document and generates events - but Lars says that it is a "Pull-Parser" model. Read his tutorial to find out more. Lars also wrote another example of StaX reading and writing RSS Feeds - which is probably one of the most common XML formats in the world.

XPath

XPath has been around for ages in a number of languages. I used to use it in XSLT where we transformed XML from one form to another. You needed ways of referring to different components of the XML. Unsurprisingly you cannot just give an array index, or X/Y co-ordinate when talking about a nested tree structure.

Anyway, XPath is basically for finding and selecting nodes within XML. Guess what... I like Lars' article on that too.

JAXB

Now rather strangely when I first came across JAXB it was rather hard to find out what it did and how it did it. Basically you can think of it as a fancier and more modern form of DOM. In essence it "Binds" some java classes that you define to the xml format you want. So if we have XML like this below

<alpha>
  <beta>blah blah blah</beta>
  <gamma>Ga Ga</gamma>
</alpha>

we might create a class "Alpha" with member variables "beta" and "gamma". If we push this xml through a JAXB parser then the xml is "unmarshalled" into a new object of class alpha, with "blah blah blah" in beta, and "Ga Ga" in gamma. Simples! as Alexander Meerkat might say.

We of course have the opposite as well - being able to turn an object into XML using the same configuration as to which classes and which member variables refer to which XML tags.

You can guess what is coming next: Yep, Lars wrote a tutorial

The Future

Well, what I have described above doesn't include real life experience. The problems I see people having most often involve speed of marshalling and unmarshalling their objects into and out of XML. I don't know for sure whether the standard JDK libraries will be fast enough for you. Maybe it doesn't matter, maybe you just need to test it in your circumstance.

Of course, maybe XML isn't the right tool at all. Particularly if the other end of the communication stream is a web browser then you probably want to be sending data in JSON format instead of XML. That is fine for small and simple data structures but can become a pain for large ones (Have a look at the Twitter JSON data format for Tweets. It gets far more complicated than you might expect - particularly when including Retweets, and references to other users, links, and other entities.