10 Tips to Fail with Enterprise Integration

If you want to make enterprise integration needlessly complicated, follow these tips.

1. Model poorly

A poor model is always a nice way to make things more complicated than they should.

Examples: You can name thing badly. You can model everyting as strings (key, list, etc.). Or you can reuse overly generic abstractions in multiple contexts instead of defining one abstraction per context. Or you can expose a relational model instead of an entity model.

2. Use immature technologies

Whenever possible, use immature, non-standard, or inappropriate technologies to make the integration complicated.

Example: Don’t use XML but JSON. Its support in IDE is still weak, its semantics for the various numeric types is poor, it prevents proper code generation (for class-based language), and JSON-Schema is still a draft.

3. Assume the network is perfect

Assume the network is perfect. It has infinite bandwidth as well as zero latency. This is a classic for disaster. Ignore completely the reality of networking. If your interface is sound at the logical level, then it will be fine in production.

Examples: Don’t distinguish between the time of the event you model and the technical time when the message was sent or received–it doesn’t matter since latency is zero. Or send replies to individual requests on a topic and leave the burden of filtering out the irrelevant replies to the subscriber at the application level–it doesn’t matter since bandwith is infinite.

4. Make loads and updates asymmetric

It is common for an interface to publish updates on topics but also provide a mean for the consumer to load data upon startup. In such case, the system should work so that the same data are delivered to the consumer for loads and updates. To introduce subtle data inconsistencies, make it so that loads and updates don’t deliver the same data.

Example: If an entity has multiple status, do not publish all status changes per updates. This way, there is a discrepance between the data you obtain per load requests and per updates.

5. Make the system as stateful as possible

If you find a way to complicate state management, go for it.

Examples: Instead of publishing entities that are consistent, publish only the delta with what has changed. The consumer must carefully ensure that all deltas are applied in order. Or define requests that reference other requests, e.g. to implement paging. The provider will need to do some bookkeeping of the previous requests.

6. Leave the protocol vague

By defining the transport technology, the encoding, and the various messages that can go through your interface, most readers of the specification will have a good understanding of what the purpose of the interface is. So stop there. Don’t bother explaining the exact protocol with the assumptions about the order of messages or when a given message can be sent or not. This way, you leave the door open to non obvious misunderstandings.

Example: don’t specificy which requests can be used anytime and which should be used only occasionally after a restart or recovery.

7. Don’t properly version your interface

Your interface will need to change. Don’t provide proper versioning. This way, supporting multiple versions will be a pain.

Example: Use XML Namespaces, but don’t use it for versioning.

8. Redefine the semantics of data between versions

Do subtle changes to the meaning of the data, so that the semantics changes in a non obvious way.

Example: Redefine what “null” means for a certain attribute.

9. Don’t distinguish between endpoint and tenant

Your interface will be accessible through an endpoint that will probably be used from multiple consumer systems (“tenant”). Define SLA per endpoint, but not per tenant. This way you will need to deploy multiple endpoints to really guarantee SLA for specific consumers.

Example: provide a limit for the frequency of load requests at the endpoint-level, but independent of the consumer systems. If a consumer misbehaves, it will prevent all other consumers from loading data.

10. Ignore monitoring needs

Do not provide any meaningful way for the consumer to check whether the provider is healthy or not. Either the consumer will have to guess, or it will have to use feature not designed for monitoring to assess the system health.

Example: aggregate data from multiple decentralized subsystems and publish them via a centralized interface, but don’t provide any way for the consumer to figure out which subsystem is healthy or not.

More

Living in the Future

The world is constantly changing. From electricity to cars to television to the internet, most generations have seen at least one breakthrough.

This will continue, and it’s certain that my generation will witness another technological shift.

Interestingly, how we react to new technologies changes itself with time.  For a lot of new technologies, my first reaction was indifference, missing entirely the new possibilities the technology offered.

The iPhone? I thought it would be a flop. Facebook? I thought it would be a fad. Bitcoin? I thought it would crash.

It seems like I belong to the late majority rather than the early adopters. Maybe Douglas Adams has also a point:

I’ve come up with a set of rules that describe our reactions to technologies:

1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.

2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.

3. Anything invented after you’re thirty-five is against the natural order of things.

Since I’m certain to witness another change, I will have to adapt, whether I like it or not.

For instance, virtual reality might be a thing, after all. It seems to me very against the natural order of things right now, but actually it’s not much crazy than television back then.

First versions of new technologies always sucked. They were bulky, limited, slow, made just usable enough for a specific niche market. For virtual reality helmets, the gamers.

With widespread adoption, the usage can completely change, though. I’m writing this post on an iPhone using a third party app, after all. Maybe virtual reality is the future of shopping, who knows.

The talent is to foresee the potential of a mass market, which isn’t always obvious.

I think there is a world market for maybe five computers — Thomas Watson, 1943

Realizing that my ability to predict successful technology changes are as good as Thomas Watson, it’s interesting to try to see how innovators see the world.

According to Paul Graham, innovators “live in the future.” They are natural early adopters and their use of technology is so that they simply build what is missing to them.

An alternate formulation which I like is from Tim Urban: innovators have an accurate “reality box.” That is, unlike most people, whose understanding of the world and what technology enable reflects the common wisdom established 10 years ago, the innovator has an accurate and up-to-date understanding of the possibilities offered by technology. This make it obvious to create new products around these capabilities.

Will virtual reality turn out to be the future of shopping, or self driving cars become mainstream, or bitcoin establish itself as a the first digital currency? Whatever the next breakthrough will be, there’s an exiting time ahead.

So I’ve decided to be more open to new ideas and keep my reality box more accurate to assess them. But changing one’s way of reacting to new ideas is hard, just as well as predicting the future.

Wearing a smart watch is still something that doesn’t appeal to me. And it apparently doesn’t appeal to many other people either.

More

The New Digital Age

The New Digital Age explores the impact of internet connectivity and digital media on society. The book witnesses changes that have already occurred, reviews current trends, and tries to predict some future moves.

Written by Eric Schmidt, a tech executive, and Jared Cohen, a former foreign policy advisor, the book focuses on the impact of technology at the political and societal level, not so much at the individual level (only the first chapter “Our future selves” is about it). I applaud this ambitious agenda.

People interested in technology and cyber criminality (e.g. TED talks, Wired) might be familiar with some of the observations and speculations in the book. The novelty that it carries will depend on the background of the reader. Some of the predictions are however unique to the authors, and they do not hesitate to give their personal opinions. This gives a special edge to the matter.

The trends and predictions are usually backed up with short annectodical evidences that are interesting in themselves. The overall discussion remains however usually quite abstract, which at times gives the impression that it lacks substance. This is to be expected from such a book, though. Prediction and precision don’t match up very well.

My main criticism of the book is that while the chapters tell a consistent story of how society evolves with periods of peacetime, revolution, conflict, and reconstruction, the chapter internals do not enjoy such a coherent treatment. The predictions that they discuss appear to exist more by accident than as the outcome of a thorough analysis. For instance, I do not recall reading anything about electronic voting. This seems to me like an unavoidable topic for such a book.

The book gives also a slight feeling of redundancy. Certain topics are discussed from a different point of view from chapter to chapter. For instance, the tension between privacy and security is discussed under the perspective of state organization, militantism, counter terrorism, etc. An improvement for a second edition would be to provide a roadmap of recurring topics and their treatment in each chapter. That would give a high-level view of the content, and would avoid this unpleasant feeling of redundancy.

While the positions in the book are relatively balanced, the overall tone is inevitably biased towards US policy, which is no suprise given Jared Cohen’s background. Also, the book emphasizes tracking and surveillance a lot and will make proponents of an anonymous internet uneasy.

Overall, I liked the book. The themes addressed are very relevant and it sharpened my understanding of the role of technology in modern society. What the future will really bring, nobody knows.

Using Multiple Google Calendars with iOS 5

With Google calendars, you can create additional calendars linked to your account. This is convenient, say, to split your own events from events of others that don’t use any online calendar but that you want to track.

With iOS 4, adding a google account would display only the primary calendar, not the auxiliarary ones. The solution then was to add them individually as WebCal calendars. The WebCal URL is somehow cryptic but could be obtained from the ID of the calendar found in the settings.

After upgrading to iOS 5, all events are duplicated n times, where n is the number of auxialliary calendars. Sounds like an aweful bug, isn’t it? Actually not, things only got better. The auxiliary calendars are now correctly supported.

Go to m.google.com/sync and select the auxiliary calendars you want to sync. The google calendars you selected will all appear under you google account on iOS 5 (maybe you need to recreate it, though). You can remove the spurious individual WebCal calendars safely.

The Social Network

I wasn’t much involved or interested in social media (twitter and the likes) until I joined SCG a few month ago. I had a rather defensive attitude and wanted to have the smallest fingerprint on the web. For several reasons, I nevertheless started using Google Shared Link, Twitter, CiteULike and Stackoverflow to see how they worked.

I must admit that I kind of like them all, now that I overcame my initial resistance.  But what I liked most is the surrounding questions on the evolution of the society. Here is a bunch of points I’m questioning myself about these days.

Ranking, reputation and suggestion system

The heart of these systems is to identify the value that the community gives to certain person or item (value is vague, maybe relevance or credibility would be better). This value can be mined using information about the network, or number of visit, etc. or by requesting user to vote. Purpose of these systems is to be fair, objective and democratic. Such systems are however complex to create. You need to design a set of rules that fit the purpose as well as a set of counter-mechanisms to eliminate abnormal behavior that still slip in (e.g. robot visit, abnormal pattern in user vote, etc.).  Ultimately all such system have their own weakness. This wasn’t too a problem when we didn’t depend critically on such system, but this is now the case.

The value of our second life

How much value to give to the web presence of an individual? For instance, recruitement has already changed with the appearance of job sites first, but then of online CV. This tendency will continue and expand to all area of our life. We can expect in the future to have consolidated profile be used more and more prior to meeting people for real. You can’t just erase all that and start from sratch. This may seriously bias our opinion on people. Prejudges related to a our web presence may be hard to overcome. Our presence on the web will be a direct measure of our skills, as is the case for instance with stackoverflow QA and CV. Will this expand to other area? Will we soon see  sentences such as “10+ meme on twitter is a plus” for people working in PR?
•    How much should we trust this information?
•    What is the “critical mass” that these systems must reach to really work?
•    Does it represent the real soft- and social-skills of a person?
•    Can we really sum up people with numbers?
•    When will the first “single consolidate metric” appear that grades an individual according to its complete web presence?

Community vs. individual

The web was first driven by communities. People which contributed to the web, adhered to the value of these communities. However, if the tendency to expose single individual continues, there will more and more tension between the community aspects and the individual, selfish aspects. This tension isn’t new and has probably been studied since decades in sociology and psychology, but the expansion of this tension to the web is new.  And the effect is unknown.  Everybody will be an active player the Internet and not just a passive user, as during the past decade.  We can then expect much more friction and instability in these social web site. Or maybe not.
References

Nowhere to Hide: Assessing Your Work Reputation Online

S*** My Domain Name Has Expired

I had the bad surprise last week to notice that my domain name had expired. Like many other before me, I realized then that the domain name business is aggressively money-driven and that many companies try to make profit out of domain name registration.

In my case, the domain name had first expired, but because I was in vacation and couldn’t do much from there, it had then moved after 30 days into the status “redemption period”. I knew about a few status but not about the complete list of status. You still have a chance to renew the domain name while in status “redemption period”, but it costs you more! Normally, the domain name should go back to public after the redemption period is over. Unfortunately, there are many companies watching the soon-to-be-expired domain names, and they systematically buy them. There are also some affiliation between registrar and such companies, which means there is apparently little chances that you can buy it again after it has expired completely. You can place a “back-order” on these companies website, but again, it will cost you more.

I was a bit disgusted by the whole process, and had no other way than to renew it at a much higher price than normal. Lesson learned: make sure you enable “automatic renewal” on your registrar website.

Here is a portion of the chat I had with the guy at register.com:

Chat log
me: Hi, I have a question about DNS renewal.
support: Ok I can help you with that, what is the domain name please ?
me: My DNS recently expired while I was on vacation. When I came back, I tried to renew it, but unfortunately my credit card had expired as well (bad luck). Now that I have updated the information for my credit card, the DNS move into “redemption-period” and I can’t renew it. The DNS was “XXXXXXX”
support: Ok thank you. Just a heads up.. DNS is Domain Name Server. What you have is a Domain Name. Thanks. I will just be a second to bring up that account.
me: Yes, sorry, I mean DN.
support: Not a problem.
support: Your domain name’s status is currently: Redemption Period
me: Yes. That’s what I obtained with WhoIs. Is there a way to renew it?
support: This means that the domain name has gone back to the registry. I can still however purchase the name back for you, however the rates are registry rates and higher then the normal renewal cost and the rates are non negotiable due to the domain not being with our company.

1 Year is $120.00
2 Years is $145.00
3 Years is $170.00
5 Years is $179.00
6 Years is $205.00
7 Years is $240.00
10 Years is $250.00
me: Will the domain goes back to “public” if I wait longer. I could then register it again with normal price?
support: Eventually, but it may go to an auction or someone may have back-ordered the domain name. This is a very risky and touchy time with a domain name. Redeeming the domain at the registry price is your last and only real chance of getting this domain name.
support: There are hundreds of “just dropped” sites that email all there clients all the domains that expired and was released that day, so your domain name will be view by hundreds of people as soon as it drops publicly.
me: What you mean is that there are some companies that systematically buy expired domain in the hope to sell it back for a higher price?
support: Correct.
support: This is a very common practice.
me: What I don’t get is who own the “registry” and the domain name right now.
support: That would be ICANN
support: They are the owner of all domain names.
me: But they are not doing any business on their own…
support: I’m not sure what you mean.. they hold the accreditation for all registrar’s. Without an ICANN accreditation you cannot legally sell domains as a company.
me: I mean, who fixed the prices you sent me? ICANN?
support: The expiration date for the domain name “XXXXXX” has past, as well as our 30 day administrative grace period during which a renewal of a domain name may be permitted. Accordingly, we have submitted the domain name to the Registry for deletion. The Registry has placed this domain name in a ‘Redemption Grace Period’, which provides the Registrant one last opportunity to ‘redeem’ or reclaim the domain name before it is made available for public registration on a first come, first served basis.

Once the domain name is placed on redemption status by the Registry we incur additional expenses in reinstating the domain name, which are in reflected in the redemption fee.

me: Ok, so the “recovery” price is registrar-specific but higher than normal renewal because of the “extra work”. And yours starts at 120 USD.
support: We charge more because we have to pay more to the registry. We do not own domain names but purchase from the registry when you purchase from us. This is not $120.00 profitable dollars for our company.
me: If I reinstall the DN for $ 120, am I able to renew it then on a yearly-based at the regular price of $ 35?
support: Correct.
support: Did you want to reinstate the domain ?
me: Yes, I will reinstall the domain. Just leave one minute to decide whether I take 1 year or more. I don’t want to be in the same situation next year. And it will cost me more money to renew it on a yearly basis.
support: You are far better off going longer then sooner, the farther out you go the less of a chance it will happen and the better savings you will get.
me: Ok, so I would like reinstall the domain for a period of 5 years at the price of $ 179.
support: Sounds great, that’s the best choice.
support: Perfect.
support: This is going to take up to 72 hours to redeem the domain back into your account. This requires a special department to redeem. You will get an email notification once the domain has been redeemed.
support: Was there anything else I can help you with today ?
me: No that was all. Thanks a lot.
support: Did you want me to put the privacy protection back on the domain once in the account ?
me: Is it still $ 11 per year?
support: It would be $45.00 for a 5 year term.
me: In this case, please include “privacy protection” as well.
support: Ok I can do that for you. Did you want me to re-create the email account as well ?
me: No. Not for the time being. So far I remember it was relatively expensive and and I can’t afford a 5 year period for the mail.
support: Ok I will get this redeemed for you as soon as possible. Was there anything else I can do for you today ?
me: I see no other points.
support: Have a great day.

Glassfish mysteries #5: transaction recovery

Here are all posts of this serie on Glassfish.

There is little information available on the web about Glassfish transaction recovery. Transaction recovery is indeed something that should be very rare.

Some background

Such a recovery is necessary only if a problem (typically a crash) occurs while the transaction manager is performing the 2-phase commit protocol. If a problem happens before the 2PC protocol starts, the transaction will timeout and be rolled back automatically. If the problem appears during the 2PC protocol, the situation is a bit more complicated: one branch may be prepared the other not, or even worse, one branch may be comitted and the other not. A distributed transaction in such a state is frequently called “in-doubt” in the litterature.

The 2PC is supposed to be a fast operation, so the probability of in-doubt transaction is supposed to be also very low. It nevertheless can happen, and in this case, the distributed transaction must be recovered. This means that the transaction manager will attempt to complete the 2PC protocol based on his own transaction log. In some case, the transaction manager doesn’t know exactly what was done or not, and it must then “heuristically” rollback or commit the pending branches. This is generally really bad as it may leave the system in an inconsistent state, with some operations having been committed in one branch (e.g: the database) and rolled back
in another one (e.g: the JMS broker).

Glassfish admin console

First of all, we’ve never been able to recover any in-doubt transaction from the Admin>Transaction page. The “recover transaction” button didn’t produce any visible effect. We were however able to force the recovery at startup by enabling the appropriate option in the transaction service configuration page.

Oracle transaction recovery

If you are using Oracle, youre database connection will need some advanced privileges to have the recovery working. Glassfish will indeed execute either a “commit force” or “rollback force” on the database, which
is usually performed by a DBA with system rights. The privileges we found were necessary are the following:

GRANT FORCE ANY TRANSACTION TO <db_conn>;
GRANT SELECT ON dba_2pc_pending TO <db_conn>;
GRANT SELECT ON DBA_PENDING_TRANSACTIONS TO <db_conn>;
GRANT SELECT ON SYS.PENDING_TRANS$ TO <db_conn>;
GRANT SELECT ON SYS.DBA_2PC_NEIGHBORS TO <db_conn>;
GRANT EXECUTE ON sys.dbms_system TO <db_conn>;
CREATE PUBLIC SYNONYM dbms_system FOR dbms_system;

Before the recovery,  the view dba_2pc_pending shows one pending transaction, whereas after the recovery the view is empty.

The is also little information about the property oracle-xa-recovery-workaround of the transaction service. It seems like there is a bug with Oracle and the view dba_2pc_pending. This view is sometimes not correctly refreshed by Oracle. The workaround’s purpose is apparently to force the view to be updated so that Glassfish can use it to identify the in-doubt transactions. This is unfortunately only a suppositon as we never found a clear explanation of the exact impact of this property.

Glassfish mysteries #4: IIOP

Here are all posts of this serie on Glassfish.

This last post will be about considerations on usage of IIOP and Glassfish. IIOP is a standard, inter-operable protocol that every J2EE-compliant application server must support. In case of java-to-java communication, IIOP is sometimes a bit overhead and some application server supports alternative protocols in this special case. However, Glassfish does support only IIOP so all remote communication with go through this protocol. Compared to plain RMI, this protocol adds transaction context preparation.

Communication timeout, distributed transactions & tuning

Heavy usage of IIOP is hard to tune. There seem also that there are some bugs in Glassfish with IIOP. We also noticed that the memory consumption was significant when remote calls are frequent. We needed to adapt the TCP ORB settings in a way to avoid communication time out. The best-practice that we’ve identified can be summarized with:

  • Use -server profile for better memory management
  • Tune -Dcom.sun.corba.ee.transport.ORBTCPTimeouts=500:30000:30:999999
  • Check « Allow Non Component Caller » in the data sources
  • Beware RedHat Linux, there seems to be some issue with it.

There are also a few other annoyances:

  • If local glassfish is running, it will always be taken as default even if JNDI specifies a remote instance
  • ProgrammaticLogin doesn’t work from Tomcat to Glassfish

http://forums.java.net/jive/thread.jspa?threadID=42017
http://forums.java.net/jive/message.jspa?messageID=352907

Lookup, load balancing, fail over and host names

EJB lookup with the InitialContext support load-balancing and fail-over.  Nodes can be added/removed to the cluster dynamically; you only need to specify a subset of endpoints in  jndi.properties file.  The lookup mechanism works conceptually like this: (1)  one of the “bootstrapping” endpoint specified in jndi.properties is accessed (2) The endpoint knows about the other nodes in the cluster and one of the node is assigned to the particular InitialContext instance.

At a point in time, the server will answer back to the client providing the address of the endpoint to use for further communication. This address will depend on the configuration of the server. If the ORB was configured to listen on 0.0.0.0 the address is the hostname as resolved on the server-side. The client must be able to contact the server based on the returned address. Depending on the network configuration this may be problematic. For instance, the hostname resolution on server-side may return a hostname that is not visible to the client if they are on different subnets.
The address resolution exist even if no cluster is used and the endpoint specified in jndi.properties is the one to use for remote communication.

http://docs.sun.com/app/docs/doc/820-4341/fxxqs?a=view
https://glassfish.dev.java.net/issues/show_bug.cgi?id=4051

@Resource for injection

As per the EJB spec, the EJB should be injected with @EJB or looked up in JNDI. Remote EJB are bound in the global JNDI and by consequence can also be injected with @Resource. This is however a bad practice and I suspect some stability issue with it. Beware this little mistake and make sure you always inject them with @EJB.

Local vs. remote calls

We’ve conducted some micro-benchmark to measure the difference between local and remote calls. We have tested the following scenario:

  • POJO
  • Local EJB
  • Remote EJB in same EAR
  • Remote EJB in separate EAR

Server-side loop

We call one EJB that performs a loop with 10’000 call to a helper object.

Pojo:0 ms
Local:63 ms
Remote internal:172 ms
Remote external:1735 ms

Client-side loop

We call 10’000x the EJB from on client, and the EJB performs one call to a helper object.

Pojo:8140 ms
Local:6688 ms
Remote internal:6750 ms
Remote external:9062 ms

We conclude that the time taken to perform the call on the server side is neglectable compared to the cost of the client-server remote call.

Mixed loop

We call one 10’000 the EJB from the client, and the EJB performs 100 calls to a helper object.

  expected
Pojo:8640 ms 8140 + 0 = 8140
Local:14031 ms 6688 + 100×63 = 12988
Remote internal:23219 ms 6750 + 100×172 = 23950
Remote external:170641 ms 9062 + 100×1735 = 182562

On the right column is the expected time that can be estimated based on the previous results.
The cost of remote intra-JVM calls (in same EAR or different EAR, but in same JVM), is then relatively neglectable.

Glassfish mysteries #3: JMS

Here are all posts of this serie on Glassfish.

This post is about Glassfish and JMS-related problems. Message-passing is a great architectural style whose main strength are (1) scalability (2) loose coupling. The J2EE stack is a great platform to build message-based applications, notably because of the message-driven beans. These are extremely easy to use and relief the developer form consuming message from the JMS queue directly, which most of the time involves some form of thread management. The Glassfish implementation contains unfortunately several bugs.

Embedded broker is buggy

The JMS broker can be configured to be embedded, local or remote. By default it is embedded. Unfortunately, there are many issues with the embedded mode which basically is not robust enough to be used in practice. Always set the broker as local.

Max delivery attempt is not considered

The property EndpointExceptionRedeliveryAttempts specifies how many times the application server will attempt to deliver the JMS message if exceptions occur. The property was not correctly considered in the early releases of Glassfish v2. Fortunately, the bug was fixed in v2ur2.

Consumption from queue hangs if selectors are used

There seem to be a bug when we consume message from the queue and use selectors at the same time. After a while the system hangs and the call to receive() blocks. I unfortunately don’t remember if the broker was configured as embedded or local.

Non-unique delivery of message

We also experienced a strange case where some messages were delivered twice. It seems like the problem was more frequent when load increased. Again, I don’t remember if the broker was configured as embedded or local.

Non-atomic delivery of JMS messages

When a JMS message is sent in a transaction that also performed some database changes, the message may be delivered before the database changes have been committed for real. Considering that this is a typically usage scenario, I know it will sound very weird. It is however a case that we’ve experience several times, and we need to manually add some locks to ensure the message would be processed after the database changes. I’ve posted a long message on java.net concerning this problem, and apparently this should not happen…But I’m positive about the fact that there is a problem somewhere and that the transaction manager sometimes commits the JMS participant before the database participant in the 2 phase commit protocol.

Reference

http://forums.java.net/jive/thread.jspa?messageID=351867&#351867
http://forums.java.net/jive/thread.jspa?messageID=259642&#259642
http://forums.java.net/jive/message.jspa?messageID=232351#232351
http://forums.java.net/jive/thread.jspa?messageID=252493&#252493

Glassfish mysteries #2: distributed transactions

Here are all posts of this serie on Glassfish.

This second post about Glassfish mysteries will be about transaction management. There is indeed some strange behaviour when usage scenarios differ from traditional Web-EJB-JPA examples.

Transaction is not rolled back

Depending on the way you package your enterprise application, the annotation @ApplicationException(rollback=true) will not be considered. This can be a very serious bug. A detailed explanation about the packaging scenario that fails can be found in the reference at the end of this post. As a workaround, the transaction can be declared in the ejb-jar.xml in which case it will be processed correctly. Lesson learned: always double check the xml generated by Glassfish during deployment (in domains/domain/generated) to verify if is matches the intended behaviour.

UserTransaction must be a singleton

Glassfish supports client-side transaction demarcation. This is part of the “gray” zone of the J2EE specification in the sense that it is not mandatory but most containers support it. The object that is used by the client to control the transaction boundaries is the UserTransaction. The UserTransaction exposes the method begin(), commit() and rollback(). A transaction is implicitly bound to the current thread. The client can not perform multi-threaded transactions, neither suspend/resume the current one.
The JTA specifications are not particularly clear regarding the thread-safety of the UserTransaction object: can the same UserTransaction be used by several threads, or should each to possess its own UserTransaction? In the case of Glassfish, the answer is even more radical: there should be one and only one UserTransaction object used per client JVM. In other words, the UserTransaction must be managed like a singleton. If you have several instances of UserTransaction then you application will apparently work, but the ACID properties of the transactions are not enforced. This means (1) concurrent clients may read uncommitted read (2) rollback will not work properly. You find at the end of this post a reference to this bug I reported on java.net. There is test case attached to the post.

TopLink hangs with client-side transactions demarcations

As I wrote in the previous section, client-side distributed transactions are part of the “gray” zone of the J2EE specification. Glassfish’s transaction manager does support client-side transaction demarcation, but unfortunately TopLink doesn’t. As a consequence, when the client attempts to commit the transaction, the system hangs. This can probably be explained by the fact that TopLink has been developed by Oracle, and the OC4J doesn’t support client-side transaction demarcation at all.  Switching to Hibernate 3 (which is very easy) solves the problem.

Allow non-component callers

We had a very complex scenario in our system and the distributed transaction contained several XA participants including database, JMS, and custom JCA connector. The transaction was started from the client-side. We were experiencing lots of stability issue, with some transaction failing randomly with low-level error messages such as “can not delist participant”, “got -1 from a read call”, etc. We noticed then that enabling the option “allow non-component callers” in the datasource configuration has a significant positive impact. Given that the definition of this option is extremely obscure (see reference at the end), I don’t know when this options should be enabled or not. Maybe it is related to the usage of Hibernate 3 also. However, it seems like that in complex transaction scenario, it definitively helps.

References
http://forums.java.net/jive/thread.jspa?messageID=319223&#319223
http://forums.java.net/jive/thread.jspa?messageID=252496&#252496
http://forums.java.net/jive/message.jspa?messageID=246736
http://docs.sun.com/app/docs/doc/820-4496/gavro?a=view