Glassfish mysteries #5: transaction recovery

Here are all posts of this serie on Glassfish.

There is little information available on the web about Glassfish transaction recovery. Transaction recovery is indeed something that should be very rare.

Some background

Such a recovery is necessary only if a problem (typically a crash) occurs while the transaction manager is performing the 2-phase commit protocol. If a problem happens before the 2PC protocol starts, the transaction will timeout and be rolled back automatically. If the problem appears during the 2PC protocol, the situation is a bit more complicated: one branch may be prepared the other not, or even worse, one branch may be comitted and the other not. A distributed transaction in such a state is frequently called “in-doubt” in the litterature.

The 2PC is supposed to be a fast operation, so the probability of in-doubt transaction is supposed to be also very low. It nevertheless can happen, and in this case, the distributed transaction must be recovered. This means that the transaction manager will attempt to complete the 2PC protocol based on his own transaction log. In some case, the transaction manager doesn’t know exactly what was done or not, and it must then “heuristically” rollback or commit the pending branches. This is generally really bad as it may leave the system in an inconsistent state, with some operations having been committed in one branch (e.g: the database) and rolled back
in another one (e.g: the JMS broker).

Glassfish admin console

First of all, we’ve never been able to recover any in-doubt transaction from the Admin>Transaction page. The “recover transaction” button didn’t produce any visible effect. We were however able to force the recovery at startup by enabling the appropriate option in the transaction service configuration page.

Oracle transaction recovery

If you are using Oracle, youre database connection will need some advanced privileges to have the recovery working. Glassfish will indeed execute either a “commit force” or “rollback force” on the database, which
is usually performed by a DBA with system rights. The privileges we found were necessary are the following:

GRANT FORCE ANY TRANSACTION TO <db_conn>;
GRANT SELECT ON dba_2pc_pending TO <db_conn>;
GRANT SELECT ON DBA_PENDING_TRANSACTIONS TO <db_conn>;
GRANT SELECT ON SYS.PENDING_TRANS$ TO <db_conn>;
GRANT SELECT ON SYS.DBA_2PC_NEIGHBORS TO <db_conn>;
GRANT EXECUTE ON sys.dbms_system TO <db_conn>;
CREATE PUBLIC SYNONYM dbms_system FOR dbms_system;

Before the recovery,  the view dba_2pc_pending shows one pending transaction, whereas after the recovery the view is empty.

The is also little information about the property oracle-xa-recovery-workaround of the transaction service. It seems like there is a bug with Oracle and the view dba_2pc_pending. This view is sometimes not correctly refreshed by Oracle. The workaround’s purpose is apparently to force the view to be updated so that Glassfish can use it to identify the in-doubt transactions. This is unfortunately only a suppositon as we never found a clear explanation of the exact impact of this property.

Glassfish mysteries #4: IIOP

Here are all posts of this serie on Glassfish.

This last post will be about considerations on usage of IIOP and Glassfish. IIOP is a standard, inter-operable protocol that every J2EE-compliant application server must support. In case of java-to-java communication, IIOP is sometimes a bit overhead and some application server supports alternative protocols in this special case. However, Glassfish does support only IIOP so all remote communication with go through this protocol. Compared to plain RMI, this protocol adds transaction context preparation.

Communication timeout, distributed transactions & tuning

Heavy usage of IIOP is hard to tune. There seem also that there are some bugs in Glassfish with IIOP. We also noticed that the memory consumption was significant when remote calls are frequent. We needed to adapt the TCP ORB settings in a way to avoid communication time out. The best-practice that we’ve identified can be summarized with:

  • Use -server profile for better memory management
  • Tune -Dcom.sun.corba.ee.transport.ORBTCPTimeouts=500:30000:30:999999
  • Check « Allow Non Component Caller » in the data sources
  • Beware RedHat Linux, there seems to be some issue with it.

There are also a few other annoyances:

  • If local glassfish is running, it will always be taken as default even if JNDI specifies a remote instance
  • ProgrammaticLogin doesn’t work from Tomcat to Glassfish

http://forums.java.net/jive/thread.jspa?threadID=42017
http://forums.java.net/jive/message.jspa?messageID=352907

Lookup, load balancing, fail over and host names

EJB lookup with the InitialContext support load-balancing and fail-over.  Nodes can be added/removed to the cluster dynamically; you only need to specify a subset of endpoints in  jndi.properties file.  The lookup mechanism works conceptually like this: (1)  one of the “bootstrapping” endpoint specified in jndi.properties is accessed (2) The endpoint knows about the other nodes in the cluster and one of the node is assigned to the particular InitialContext instance.

At a point in time, the server will answer back to the client providing the address of the endpoint to use for further communication. This address will depend on the configuration of the server. If the ORB was configured to listen on 0.0.0.0 the address is the hostname as resolved on the server-side. The client must be able to contact the server based on the returned address. Depending on the network configuration this may be problematic. For instance, the hostname resolution on server-side may return a hostname that is not visible to the client if they are on different subnets.
The address resolution exist even if no cluster is used and the endpoint specified in jndi.properties is the one to use for remote communication.

http://docs.sun.com/app/docs/doc/820-4341/fxxqs?a=view
https://glassfish.dev.java.net/issues/show_bug.cgi?id=4051

@Resource for injection

As per the EJB spec, the EJB should be injected with @EJB or looked up in JNDI. Remote EJB are bound in the global JNDI and by consequence can also be injected with @Resource. This is however a bad practice and I suspect some stability issue with it. Beware this little mistake and make sure you always inject them with @EJB.

Local vs. remote calls

We’ve conducted some micro-benchmark to measure the difference between local and remote calls. We have tested the following scenario:

  • POJO
  • Local EJB
  • Remote EJB in same EAR
  • Remote EJB in separate EAR

Server-side loop

We call one EJB that performs a loop with 10’000 call to a helper object.

Pojo:0 ms
Local:63 ms
Remote internal:172 ms
Remote external:1735 ms

Client-side loop

We call 10’000x the EJB from on client, and the EJB performs one call to a helper object.

Pojo:8140 ms
Local:6688 ms
Remote internal:6750 ms
Remote external:9062 ms

We conclude that the time taken to perform the call on the server side is neglectable compared to the cost of the client-server remote call.

Mixed loop

We call one 10’000 the EJB from the client, and the EJB performs 100 calls to a helper object.

  expected
Pojo:8640 ms 8140 + 0 = 8140
Local:14031 ms 6688 + 100×63 = 12988
Remote internal:23219 ms 6750 + 100×172 = 23950
Remote external:170641 ms 9062 + 100×1735 = 182562

On the right column is the expected time that can be estimated based on the previous results.
The cost of remote intra-JVM calls (in same EAR or different EAR, but in same JVM), is then relatively neglectable.

Glassfish mysteries #3: JMS

Here are all posts of this serie on Glassfish.

This post is about Glassfish and JMS-related problems. Message-passing is a great architectural style whose main strength are (1) scalability (2) loose coupling. The J2EE stack is a great platform to build message-based applications, notably because of the message-driven beans. These are extremely easy to use and relief the developer form consuming message from the JMS queue directly, which most of the time involves some form of thread management. The Glassfish implementation contains unfortunately several bugs.

Embedded broker is buggy

The JMS broker can be configured to be embedded, local or remote. By default it is embedded. Unfortunately, there are many issues with the embedded mode which basically is not robust enough to be used in practice. Always set the broker as local.

Max delivery attempt is not considered

The property EndpointExceptionRedeliveryAttempts specifies how many times the application server will attempt to deliver the JMS message if exceptions occur. The property was not correctly considered in the early releases of Glassfish v2. Fortunately, the bug was fixed in v2ur2.

Consumption from queue hangs if selectors are used

There seem to be a bug when we consume message from the queue and use selectors at the same time. After a while the system hangs and the call to receive() blocks. I unfortunately don’t remember if the broker was configured as embedded or local.

Non-unique delivery of message

We also experienced a strange case where some messages were delivered twice. It seems like the problem was more frequent when load increased. Again, I don’t remember if the broker was configured as embedded or local.

Non-atomic delivery of JMS messages

When a JMS message is sent in a transaction that also performed some database changes, the message may be delivered before the database changes have been committed for real. Considering that this is a typically usage scenario, I know it will sound very weird. It is however a case that we’ve experience several times, and we need to manually add some locks to ensure the message would be processed after the database changes. I’ve posted a long message on java.net concerning this problem, and apparently this should not happen…But I’m positive about the fact that there is a problem somewhere and that the transaction manager sometimes commits the JMS participant before the database participant in the 2 phase commit protocol.

Reference

http://forums.java.net/jive/thread.jspa?messageID=351867&#351867
http://forums.java.net/jive/thread.jspa?messageID=259642&#259642
http://forums.java.net/jive/message.jspa?messageID=232351#232351
http://forums.java.net/jive/thread.jspa?messageID=252493&#252493

Glassfish mysteries #2: distributed transactions

Here are all posts of this serie on Glassfish.

This second post about Glassfish mysteries will be about transaction management. There is indeed some strange behaviour when usage scenarios differ from traditional Web-EJB-JPA examples.

Transaction is not rolled back

Depending on the way you package your enterprise application, the annotation @ApplicationException(rollback=true) will not be considered. This can be a very serious bug. A detailed explanation about the packaging scenario that fails can be found in the reference at the end of this post. As a workaround, the transaction can be declared in the ejb-jar.xml in which case it will be processed correctly. Lesson learned: always double check the xml generated by Glassfish during deployment (in domains/domain/generated) to verify if is matches the intended behaviour.

UserTransaction must be a singleton

Glassfish supports client-side transaction demarcation. This is part of the “gray” zone of the J2EE specification in the sense that it is not mandatory but most containers support it. The object that is used by the client to control the transaction boundaries is the UserTransaction. The UserTransaction exposes the method begin(), commit() and rollback(). A transaction is implicitly bound to the current thread. The client can not perform multi-threaded transactions, neither suspend/resume the current one.
The JTA specifications are not particularly clear regarding the thread-safety of the UserTransaction object: can the same UserTransaction be used by several threads, or should each to possess its own UserTransaction? In the case of Glassfish, the answer is even more radical: there should be one and only one UserTransaction object used per client JVM. In other words, the UserTransaction must be managed like a singleton. If you have several instances of UserTransaction then you application will apparently work, but the ACID properties of the transactions are not enforced. This means (1) concurrent clients may read uncommitted read (2) rollback will not work properly. You find at the end of this post a reference to this bug I reported on java.net. There is test case attached to the post.

TopLink hangs with client-side transactions demarcations

As I wrote in the previous section, client-side distributed transactions are part of the “gray” zone of the J2EE specification. Glassfish’s transaction manager does support client-side transaction demarcation, but unfortunately TopLink doesn’t. As a consequence, when the client attempts to commit the transaction, the system hangs. This can probably be explained by the fact that TopLink has been developed by Oracle, and the OC4J doesn’t support client-side transaction demarcation at all.  Switching to Hibernate 3 (which is very easy) solves the problem.

Allow non-component callers

We had a very complex scenario in our system and the distributed transaction contained several XA participants including database, JMS, and custom JCA connector. The transaction was started from the client-side. We were experiencing lots of stability issue, with some transaction failing randomly with low-level error messages such as “can not delist participant”, “got -1 from a read call”, etc. We noticed then that enabling the option “allow non-component callers” in the datasource configuration has a significant positive impact. Given that the definition of this option is extremely obscure (see reference at the end), I don’t know when this options should be enabled or not. Maybe it is related to the usage of Hibernate 3 also. However, it seems like that in complex transaction scenario, it definitively helps.

References
http://forums.java.net/jive/thread.jspa?messageID=319223&#319223
http://forums.java.net/jive/thread.jspa?messageID=252496&#252496
http://forums.java.net/jive/message.jspa?messageID=246736
http://docs.sun.com/app/docs/doc/820-4496/gavro?a=view

Glassfish mysteries #1: JavaMail

Here are all posts of this serie on Glassfish.

This serie of post will cover some problems we experienced with Glassfish. Let’s start with a few easy ones related to JavaMail. These one are not blocking but rather annoying.

Lookup from JNDI

There’s a bug in Glassfish v2ur2 that prevent you from getting the JavaMail session directly. You will need to use the following code

Context ic = new InitialContext();
MailConfiguration conf = (MailConfiguration)
ic.lookup("mail/notification");
Session session =
javax.mail.Session.getInstance( conf.getMailProperties(),null );

Custom properties

There are some obscure rules to follow if you plan to add custom properties in your JNDI mail entry. We can read in the Glassfish documentation: “”Every property name must start with a mail- prefix. The Application Server changes the dash (-) character to a period (.) in the name of the property, then saves the property to the MailConfiguration and JavaMail Session objects. If the name of the property doesn’t start with mail-, the property is ignored.”

SMTP + authentication

There are no standard properties to deal with SMTP authentication. If you need to support authentication you will need to rely on custom properties. Here is the code that we’ve been using:


String auth = session.getProperty("mail.smtp.auth"); //
String pwd = session.getProperty("mail.smtp.password");

if ((auth != null) && new Boolean(auth))
{
Transport transport = session.getTransport("smtp");
transport.connect(conf.getMailHost(), conf.getUsername(), pwd);
msg.saveChanges();
transport.sendMessage(msg, msg.getAllRecipients());
transport.close();
}
else
{
Transport.send(msg);
}

References

https://glassfish.dev.java.net/javaee5/docs/AREF/abhaq.html
http://forums.java.net/jive/thread.jspa?messageID=264233