XML is typically used in two different kinds of contexts:
- It was originally designed to describe large and complex documents in a structured way. XML is a pragmatic evolution of SGML which had proven to be very cumbersome to use in practice.
- It is more and more used as a neutral language to describe data structures passed among processes in distributed environments. XML then provides a more flexible and neutral communication medium than binary solutions such as RPC (Remote Procedure Call) and CORBA.
The former described context basically requires validators, XML databases, XML query languages, and XML transformers, which can be in the form of style sheets. Basically, the issue is just to store, retrieve and possibly restructure XML documents.
The latter just needs parsers (to convert an XML stream to some form of usable data structure) and unparsers (to perform the opposite translation), but it puts more stress on performance, since it must compare with statically compiled schemes based on RPCs.
Generic XML parsers are now available. As depicted in the figure below, applications commonly use them to take an incoming XML message and turn it into a generic tree structure (Such a tree is typically built according to the DOM standard, defined by the W3C.) Depending on the application, the incoming message can then be validated using a DTD or not.
The generic tree is then used by the application to fetch useful information using runtime table lookups in attributes, and tree walkthrough primitives.
XMLBooster takes a radically different approach:
- One first describes the set of acceptable XML messages using an ad hoc formalism. This formalism, hereafter referred to as the meta-definition, can, in first approximation, be seen as a DTD extended to describe a data structure that will receive the various parts of the message in addition to the structure describing of the message itself.
- Using this meta-definition as input, XMLBooster produces an XML parser as a module in one of the programming languages it supports. This module is generated in source form, and can be used on any platform where a working compiler for the target language is available.
- The application programmer can then call this module, which will return an error message if the input does not comply with the message format described in the meta-definition, or a fully initialized data structure in the host language if the input has been analyzed successfully.
This approach delivers better performance, by not having to take the full generality of XML into account and by only recognizing the set of constructs required by the application at hand. The performance of the server application (assuming of course that an application that must be able to receive XML messages and react upon them can somehow be seen as a server) is also greatly improved by not requiring dynamic lookups to extract information any longer: that part of the job is taken care of by the parser on the fly while processing the incoming XML message.
Building a usable data structure in the host language goes far beyond the sole performance issues : accessing the XML generic tree using dynamic searches based on string comparisons is cumbersome and error prone. Most errors are not caught at compile-time and systems must be debugged extensively before a decent level of quality can be guaranteed.
Replacing these dynamic lookups by statically defined data structures makes such dynamic mistakes totally impossible, making the task of the programmer much easier and improving the overall quality of her (or his) work.
This approach is much faster than the generic XML parser approach (see our benchmarks).