November 2009 – ://ewernli

11 Reasons Why I Hate XML

… at least in Java.

1 – Namespace and import

XML is only apparently simple. As soon as namespace are used, it immediately gets complicated. What is the difference between targetNamespace=”…”, xmlns=”…” and xmlns:tns=”…” ? Can I declare several prefixes for the same namespace? Can I change the default namespace from within a document? What happens if I import a schema and rebind it to another namespace? How do I reference an element unambiguously? Ever wondered how to really create a QName correctly? Ever wondered what happens if you have a cycle in your dependencies?

2 – Encoding and CDATA

XML encoding and file encoding are not the same. This is a huge source of troubles. Both encoding must match, and the XML file should be read and parsed according to the encoding specified in the XML header. Depending on the encoding, characters will be serialized in a different way, again a huge source of confusion. If the reader or writer of an XML document behave incorrectly, the document can be dangerously corrupted and information can be lost. Editors don’t necessary display the characters correctly, while the document may be right. Ever got a ? or ¿ in your text? Ever made a distinction between & and & ? Ever wondered whether a CDATA section was necessary or if using UTF-8 would be ok? Ever realized that < and > can be used as-is in attributes but need an encoding within a tag?

3 – Entities and DOCTYPE

Somehow relates to #2, but not only. XML entities are a generic way to define variables and are declared in the DOCTYPE. You can define custom entities; this is rather unusual but still need to be supported. Entites can be internal or external to your XML document, in which case the entity resolving might differ. Because entities are also used to escape special character, you can not consider this as an advanced feature that you won’t use. XML entities needs to be handled with care and is always a source of trouble. For instance, the tag <my-tag>hello&world</my-tag> will trigger 3 characters(...) events with SAX.

4 – Naming convention

Ever wondered whether it was actually better to name your tag <my-tag/>, <myTag/> or <MyTag/>? The same goes for attributes….

5 – Null, empty string and white spaces

Making the difference between null and empty string with XML is always painful. Null would be represented by the absence of the tag or attribute, whereas empty string would be represented with an empty tag or empty attribute. The same problem appears if you want to distinguish empty list and no list at all. If not considered clearly upfront (which is frequently the case), it can be very hard to retrofit clearly this distinction in an application.
Whitespace is another issue on its own. The way tabs, spaces, carriage return, line feeds are processed is always confusing. There are some options to control that, but it’s way too complicated for most of the usage. As a consequence, sometimes these special characters will be encoding in entities, sometimes embedded in CDATA and sometimes stores as-is in the XML.

6 – Normalization

XML encryption and signature look fine on paper. But as soon as you dig in the spec, you realize that it’s not so easy because of the syntactic and semantic equivalence of XML document. Is <my-tag></my-tag> the same as <my-tag/>? To solve this issue, XML normalization was introduced which define the canonical representation of a document. Good luck to understand all the subtleties when considering remarks #1, #2, #3 and #5.

7 – Too many API and implementations

Even if stuffs improved in this area, there are too many API and implementation available. I wish there was one unified API and one single implementation sometimes…Ever wondered how to select a specific implementation? Ever got a classloader issue due to an XML library? Ever got confused whether StAX was actually really better than SAX to read XML documents?

8 – Implementation options

Most XML implementations have options or features to deal with the subtleties I just describe. This is especially true for namespace handling. As a consequence, you code may work on one implementation but not on another. For instance, startDocument should be used to start an XML document and deal with namespace correctly. The strictness of the implementations differs, so don’t take for granted that portability is 100%.

9 – Pretty printing

There are so many API and frameworks that it’s always a mess to deal with pretty printing, if supported by the framework.

10 – Security

XML was not designed for security. Notorious problems are: dangerous framework extension, XML bomb, outbound connection to access remote schema, extensive memory consumption, and many more problems documented in this excellent article from MISC. As a consequence, XML document can be easily abused to disrupt the system.

11 – XPath and XSLT

XPath and XSLT belong to the XML ecosystem and suffer the same problems as XML itself: apparent simplicity but internal complexity. I won’t speak here about everything else that surrounds XML and that forms the big picture of the XML family specifications. I will just say that I recently got a NPE in NetBeans because “/wsa:MessageID” was not ok but using “/wsa:MessageID/.” was just fine. Got the point?

OpenESB: Invoke an Asynchronous Web Service

I was contacted last week to know if I had actually integrated an asynchronous web service in OpenESB, as promised in a previous post. The NetBeans SOA package is sometimes a bit obscure, though there are some explanation about the examples. I took a bit of time to dig this out, and here is then the promised follow-up (except that I won’t use WS-Addressing). I will use

OpenESB bundled with Glassfish
NetBeans to author the BPEL process
SoapUI to test the process

What we want to get

The BPEL process that will be created is a synchronous BPEL process, which calls an asynchronous web service using a correlation set to “resume” the process when the asynchronous response is received. The scenario is not very realistic – a BPEL process that calls an asynchronous WS will itself be asynchronous most of the time. The asynchronous WS may indeed take arbitrary long to respond; the client of the BPEL process would probably time out in this case. This example suffices however to show the underlying principles.

The BPEL process is synchronous
But it calls an asynchronous WS service
We use correlation-set for request/response matching

The BPEL process that we want to obtain at the end is shown below:

Create the PartnerLinks

One or two PartnerLinks?

Communication to and from the asynchronous web service can be realized using a single partner link with two ports or using two partner links with one port each.
From point of view of BPEL an asynchronous request/response is indeed no more than a pair of one-way messages. The request/response matching will anyway be done using correlation set.

As a consequence, the messages can come from “anywhere” and there is therefore not need to have one single partner link. I found it easier to have 2 partner links so that all ports on the left side are the one exposed by the process, and all ports on the right side are the one consumed by the process.

WSDL with one-way PartnerLink

PartnerLinks can be defined in the BPEL process or in the WSDL of the web service. NetBeans has a nice feature to create the PartnerLink in a WSDL therefore I chose to define them there.

A one-way web service is a web service which defines only <input> or <output>. I therefore took the WSDL of my previous post and simply removed the <output> tags so that they become one-way web service. (I also removed anything related to WS-Addressing as it’s not used here).

The PartnerLink can then easily be created with NetBeans using the “Partner” view in the WSDL. The two WSDLs then looked like this:

Create the BPEL process

Add the PartnerLink

Now that the WSDL files of the asynchronous web services are ready, I create a new BPEL process. I then added the following PartnerLinks:

AsynchronousSampleClient from the SOA sample bundled with NetBeans
AsyncTestImplService created previously
AsyncTestResponseImplService create previously

Wire the request/response

Then I wired the request/response as follows. I relied on NetBeans variable creation for each <invoke> or <receive> activity. I therefore got the following variables:

ResponseIn
SayHelloIn
OperationAIn
OperationAOut

Assign the variables

For the purpose of this example I assign the variable between the message like follows. Note that this example make no sense from a business point of view.

Define the correlation set

A receive activity within the process should be assigned a correlation set. The BPEL engine is otherwise unable to match the request/response and resume the right process instance.

I defined a correlation set “correlation” which would use the property “correlationProp”. A correlation property is a value that existing in different message and that can be used to match messages together. The property itself is a “symbolic” name for the value, and the corresponding element in each message is defined using so-called property aliases.
I then added two aliases, one in each WSDL file, and defined how “correlationProp” would map in the “sayHello” and “response” message respectively.

The process can then be built without warnings.

Deployment

The endpoint ports

The process defines 3 ports that can be changed according to your need. In this example the expected endpoints are:

The corresponding WSDL can be obtain by appending “?wsdl” in the URL.

Note that the address for the asynchronous callback is not passed as parameter from the BPEL process, but should be hard-coded in the web service implementation. It would however be very easy to pass the callback address as an extra parameter so that the asynchronous web service is entirely decoupled of the BPEL process.

Build

Rebuild the process to take the latest change in the port URL.

Create the composite application (CA)

The BPEL process cannot be deployed as-is. You will need to embed the BPEL process into a composite application, which is a deployable unit. This is very easy to do:

Create a new project of type composite application.
Drag and drop the BPEL project onto the Service Assembly
Rebuild the composite application

All that is necessary will be created automatically during the build. After the build is complete, NetBeans will refresh the Service Assembly and it looks then like this:

Deploy

Go in the Glassfish console and deploy the service assembly produced in the previous step.

Import WSDL in SoapUI

Start SoapUI and import the 3 WSDL.

Mock the asynchronous web service

Now that the 3 WSDL have been imported, we will create a mock for the asynchronous web service. This way we can verify if the BPEL process call the asynchronous web service correctly and we can send the callback response manually.

Select the WSDL “AsyncTestImplPortBinding”, and right-click “Generate Mock Service”. Make sure to use

path = /AsyncTestImplService/AsyncTestImpl?*
port = 8888

So that it matches the port that the BPEL process will use.

Make sure to start the Mock, in which case SaopUI displays “running on port 8888” at the top-right of the Mock window. The project looks like this:

Test

1 – Invoke BPEL process

Send the following SOAP message to the BPEL process (located at http://localhost:18182/AsynchronousSampleClient):
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:asy="http://enterprise.netbeans.org/bpel/AsynchronousSampleSchemaNamespace"> <soapenv:Header/> <soapenv:Body> <asy:typeA> <paramA>dummy</paramA> <id>123</id> </asy:typeA> </soapenv:Body> </soapenv:Envelope>

2 – Receive the asynchronous invocation

When the Mock service the asynchronous message it displays something like “[sayHello] 4ms”. The message can be opened and should look like:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <sayHello xmlns:msgns="http://ewe.org/" xmlns="http://ewe.org/"> <arg0 xmlns="">123</arg0> </sayHello> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

3 – Send the callback manually

We simulate manually the behavior of the mock service and send the following message to the callback endpoint (http://localhost:18182/AsynchronousSampleClient/response):

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ewe="http://ewe.org/"> <soapenv:Header/> <soapenv:Body> <ewe:response>  <arg0>123</arg0> </ewe:response> </soapenv:Body> </soapenv:Envelope>

4 – Get the synchronous reply

So far, the SOAP request of step #1 was still waiting to receive the synchronous response. After the callback has been sent, the BPEL engine should resume the right instance of the BPEL process (using the correlation value “123”), which should then terminate.

SoapUI will display the time taken for the request/response which will then be something like “response time: 5734 ms”. The time will of course depend on how long you took to perform step 2 and 3. (Note that after some time, the request will timeout if you really take too long to do these steps.)
The SOAP response message should look like:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <typeA xmlns:msgns="http://enterprise.netbeans.org/bpel/AsynchronousSampleClient" xmlns="http://enterprise.netbeans.org/bpel/AsynchronousSampleSchemaNamespace"> <id xmlns="">123</id> </typeA> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Conclusion

This example as-is make little sense from a technical and business point of view; I wish I had also used more meaningul names for the various elements. It however shows the principle of asynchronous web service invocation using OpenESB. The adaption of this example for meaningful use cases should be relatively simple. It’s a matter of changing the message types and assignment rules.

Taming Size and Complexity

The only real problem with modern software is size and complexity. If we had a bigger brain able to apprehend and reason about software as a whole without omitting details, we wouldn’t have that many issues. Unfortunately, our mental abilities are limited, and as a consequence, we need to have ways to build software whose complexity is beyond our own analytical power. The same is true for any large scale engineering initiative. Apart from discipline, which is a prerequisite to manage size and complexity, traditional ways to address size & complexity are: abstraction, automation and intuition.

Abstraction

The traditional way to address complexity is to raise the abstraction level. Get rid of details and stay focused on the essential – complexity goes away. You can then reason on different parts at various abstraction levels independently of each other. This is the ground-breaking argument about any modeling effort or methodology. The problem is that the whole is not equal to the sum of its parts. Unforeseen interactions will emerge resulting in a myriad of potential problems. An other major problem is the traceability of the different parts.

Automation

The traditional way to address size is through automation. A lot of task that we perform are not tedious due to their very nature, but due to the effort they demand. Our concentration is also limited which implies we will make mistakes. Automation leads then not only to higher productivity but higher quality. There are too many examples of automated task, but code formatting and refactoring fall for instance into this category. Even though automation is extremely effective for specific task, automation is also impacted by the complexity of the software to produce. State explosion is for instance one of the main problems of a technique such as symbolic execution.

Intuition

Actually, problem solving implies not only strong analytical skills but also some form of intuition. The same goes with software and program understanding. Software exploration and visualization are powerful techniques to reason about abstract information in an intuitive way. Software is intangible and has by consequence no natural representation – this leaves the door open for new visualization metaphors. Examples of interactive visual development technologies are BPEL workflow, DSM, or polymetric views.