Send As SMS

Tuesday, December 27, 2005

Java, XML and pretty-printing

I have a need to pretty print some XML fragments. I figured that, since j2se5 already has a fairly thorough XML API implementation, this would be straightforward. Needless to say I was mistaken.
Document doc = ... // the document containing the fragment
Element element = ... // the particular element that is to be printed

DOMImplementationLS dils = (DOMImplementationLS) doc.getImplementation();

LSSerializer ser = dils.createLSSerializer();
ser.getDomConfig().setParameter("format-pretty-print", "true");

LSOutput lso = dils.createLSOutput();
lso.setByteStream(System.out);

ser.write(element, lso);
Setting aside the awful design of the API, this appears to be how this is supposed to be done. Inconveniently, the XML implementation in Sun's j2se5 does not support the format-pretty-print configuration parameter. As pretty printing is the entire point of this exercise. I needed to find another way.

There is another DOM API (there appear to be several) implemented in some versions of Xerces which revolves around a much cleaner Serializer interface (not to be confused with an LSSerializer, of course).
Document doc = ... // the document containing the fragment
Element element = ... // the particular element that is to be printed

OutputFormat of = new OutputFormat();
of.setIndenting(true);
of.setOmitXMLDeclaration(true);

DOMSerializer ser = SerializerFactory.getSerializerFactory(Method.XML).
makeSerializer(System.out, of).asDOMSerializer();

ser.serialize(element);
This actually achieves the desired result. Of course, I'd rather have a solution which didn't require stepping outside of the APIs included in j2se5 and, it turns out that I don't have to. The above code is written against Xerces, but not by adding a xerces.jar to my classpath. Rather, I merely require the following import:
import com.sun.org.apache.xml.internal.serialize.*;
That's right; Sun has slipped a copy of Xerces into j2se5. This is fine, but why oh why does the standard API not expose the pretty-printing functionality that's already present in the implementation that they're using?

(Obviously there's a portability problem with this approach. A portable implementation can of course be realised by embedding a version of Xerces in the application jar and, fortunately, Sun's renaming of the packages means that this can be done without getting into namespace clashes. However, for situations where portability to other JVM+library implementations is not a concern, this is a handy shortcut.)