ewernli

Object-Oriented Equality

I discussed in my previous post the fact that we need a better support of immutability in object-oriented language. The problem is that it’s no so easy to add because the object paradigm is rooted in the concept of an object being an entity whose state is normally mutable. I discuss here about one of its cousin: object equality.

Some objects are more equal than others

First a quick recap. Object equality is an equivalence relation traditionally implemented through equals and hashCode. Here is the corresponding Javadoc:

The equals method implements an equivalence relation on non-null object references:
•    It is reflexive: for any non-null reference value x, x.equals(x) should return true.
•    It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
•    It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
•    It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
•    For any non-null reference value x, x.equals(null) should return false.

The definition implicitly allows mutable state to be used for equality. This is however dangerous, and equals should be defined only based on immutable state when possible.

This is notably a requirement for Collections, which require a stronger equality contract. If the object equality changes while it is in the collection, the behavior of the collection may not be consistent. See pitfall #3 in How to write equality in Java.

The simplest solution to this problem is to make object equality immutable, that is, the fields participating in the equivalence relation are final – the equality can never change post-construction.

This is however not always possible, especially if the object requires an initialization in a post-construction phase. If the object equality depends on related objects, a strictly immutable object might still see its hashCode change if one of the depending object is mutated. This leads to the concept of ownership, where object that are owned should also be immutable. Pure ownership is however an object-level/run-time concern which is not easy to ensure (class-based ownership is possible but more limited).

As presented in “Understanding the Impact of Collection Contracts on Design”, we should then consider (i) the construction of object (ii) the mutability of the object state and (iii) the “deepness” of the equivalence relationship to reason about object equality.

We have then actually three types of fields:

–    final. Never change post-construction and referenced object is immutable
–    eventually final. Can change post-construction but will be frozen in a point in time
–    mutable. Can be mutated anytime.

Object could be frozen post-creation as is proposed in “Flexible immutability with frozen objects”. The equivalence relation could use only final and eventually final fields. Owned object could be modified only through the parent object, as to ensure that the equality contract is never broken.

There is no one notion of equivalence

The problem remains that there is not “correct” implementation of object equality. It mostly depends on the usage. You may want to compare list based on reference identity, but also based on their content sometimes. Should we then have a notion of “deepness” right into the equals operator, in a way similar to the variants of shallow and deep cloning?

aList.deepEquals( anotherList )

Well, that’s actually what already exists with OrderedCollection, where you specify the Comparator to be used. Is it the solution? Should we remove the object equality form the object and move it the client of the object?

Set s = new HashSet( new ReferenceEqualityContract() );
Or should we be able to instantiate an object and specify which equality contract (or more generally a comparison contract) it should fulfill?

In this case, an object is limited to have only one equivalence relation at a time. (See this post for a Haskell sample of how to do it)

One could specify the behavior of the object at creation time and see that as an object-level behavioral variation. The types of two objects with different equivalence relation should however be different to ensure the relation is always symmetric (objects with different equivalence relation are always different). This means that the each specific variation would correspond to a type, which inherits from a common type.

– Is object equality as subset of the problem of object comparison? Or are they fundamentally different? State used for comparison shouldn’t change while object is in collection, but it’s no necessary part of the “primary key”…

– Should an object have only one equivalence or comparison definition at a time? Or could an object have several ones at the same time? Or one equivalence relation but several comparison definitions at a time? (We could easily imagine two lists with the same objects, but one list storing the object in ascending order and the other one in descending order)

The fact is that in a pure object world we should uniquely compare object by reference. There should be no two objects with the same “primary key”. In “Declarative object identity using relation type“, the authors introduced the notion of scope as to create and obtain the same reference for a given key. The risk then is to alter objects in an unexpected way and break other object invariants (dangling alias, etc.). It is simply impractical. The world is constantly evolving, and we are forces sometimes to have two objects for the same entity: one for the old and one for the new entity (See this other post on persistent data structure). Such objects somehow relates together and are not completely “different”, hence the notion of object equality.

But this little digression still doesn’t solve the issue that whatever the equality contract that the object must fulfill, we cannot ensure that it will be the case.

And it gets worse with inheritance

“There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.” — Effective Java

Indeed if we have a Point class and a ColorPoint class, the equivalence relation may be broken if ColorPoint doesn’t redefine equals and hashCode. See pitfall #4 in How to write equality in Java. To enforce the equality contract, two objects may be equal only if they have the same type.

The problem is that it is too restrictive. A context that uses color points, such as a graphic editor, may want to share the points with a context that doesn’t have a notion of color, such as 2D algorithm. The graphic editor would test the equivalence of two figures according to the color as well. And the 2D algorithm would test the equivalence of two points according to their position solely, in a way to prevent division by zero.

Also, once an object has overridden the default equals and hashCode which implements reference equality, it is also impossible for an object to fall back to this mode. As a consequence we may end up in a situation with subclasses whose equivalence relation can’t be expressed. In such case, it should be forbidden to compare objects.

Should we then re-think object equality with ternary logic, so as to be able to return true, false and N/A?

EDIT

Here I will gather other links related to the subjects

API Design: indentity and equality
Whose object is it, anyway?
Redesigning System.Object/java.lang.Object
You are what you is: defining object identity
Equals implementation in NHibernate, proxy question
== vs === and type coercion
What does Barbara Liskov have to say about Equality in Java?
Grant Matcher Puzzle
The left hand of equals
Keith Cirkel and Allen Wirfs-Brock on deepCloning

Object-Oriented Immutability

There seem to be a growing awareness about the benefit of immutability. Particular cases where immutable objects are much welcome are for instance in the case of value objects or for defensive copying. The problem is that immutable object don’t really go well with the object paradigm.

Let’s consider an application, which need to deal with the concept of revision. A revision is composed of a major and a minor revision number, e.g. “1.4”. This is typical candidate for a value object. The only sensible methods that deal with revision are inc which increases it, and promote which jumps to the next major version (“1.3” -> “2.0”). I will consider only inc in this post.

Here would be the (trivial) code in Java:

public class Revision
{
   private final int major;
   private final int minor;
   public Revision( int major, int minor ) { ... }

   public Revision inc() {
       return new Revision( major, minor+1 );
   }
}

This is however only a ad-hoc solution. This does not work well with polymorphism for instance: if a subclass of Revision is created, say NameRevision, calling inc on an instance would result in a Revision not NameRevision. This problem can be solved in other ways, and would for instance not appear in Smalltalk where instanciation would be done with super.basicNew on the class-side. But this just shows how mainstream OO language are not designed for immutability.

How would that be in a functional language such as Clojure? A map would probably be used to hold the major and minor number and the inc function would be as simple as setting a new value to the minor key in the map. Because any data modification results in a new structure, we don’t have the explicity make a difference between creating and modifying a structure. See my discussion on persitente data structure

(defn create-revision [] (hash-map :minor 1 :major 0))
(defn inc-revision [r] (assoc r :minor ( + (get r :minor) 1 ))

Note that the inc function is “polymorphic” in the sense that if I pass a map with :minor, :major and :name, I obtain a copy which has the three keys values.

The fundamental mismatch between OO and functional programming is that assignment has different semantics. In OO, assignment change the data in-place. In functional programming, assignments result in a copy.

And what if assignment would also result in a copy in OO? That sounds a bit contradictory to the OO spirit, but well, let’s give it a try…

If assignments result in a copy, this means that something like this.field = “newvalue” actually returns a new instance of the object, right? We could then imagine a class like this:

public class immutable Revision
{
   int major;
   int minor;

   public int getMajor {
      return this.major
   }

   public Revision setMajor( int value ) {
      // assignment result in a copy!
      return ( this.major = value );
   }

   // same for minor

   public Revision inc()  {
      return this.setMajor( this.getMinor() + 1 );
   }
}

We can somehow see that a generalized form of the builder pattern, and it fit well with fluent interface:

Revision r = new Revision( 1, 0 );
r.inc(); // return 1,1
r.inc(); // return 1,1

But

Revision r =
new Revision( 1, 0 )
.inc()
.inc(); // return 1,2

Note the keyword immutable that is necessary to specify the semantics of assignment for the class (there will indeed always be some mutable classes).

We now have two semantics for the assignment, one producing a copy and one changing the data in-place.

That sounds logical, that all classes in class hierarchy share the same mutability. Assigning a field from a subclass or a superclass should indeed produce the same result. But at the same time, this also means that we have two Object types? One mutable and one immutable? Hmm…

Another sound restriction for such a language would be to have assignment restricted to this. Something like obj.field = "newvalue" would be forbidden. This is anyway a good practice.

Also assuming that the language supports closures, do they fit in the picture? I would say yes, but I’m not sure. Given the signature of the closure is determined (it’s a function after all), and it cannot access any field of the object directly, so the semantics should be consistent. This would however require a bit more formalization to ensure there is no hole…

Like in some other post, I don’t know if the idea presented here hold for real. But I do like immutable object even in OO, and think that OO language lacks supports for such design. Current solution for immutable object is ad-hoc, but we could imagine blending functional programming and OO a bit more, resulting in some object having a hybrid behavior. Such object would essentially be immutable data structure embedding at the same time the function that corresponds to them, which still fit the equation Object = Data + Logic.

More About Immutability

To mutate or not to mutate?
Frozen object in ruby
Java.next : immutability “But the benefit of using immutable data increases when you compose objects together. If you wanted to compose the synchronized version with another object, you would have have to dig back into the internals of both objects, study their locks, and pick a new lock strategy to cover the two objects together. Composing operations with immutable objects is much easier.”

11 Reasons Why I Hate XML

… at least in Java.

1 – Namespace and import

XML is only apparently simple. As soon as namespace are used, it immediately gets complicated. What is the difference between targetNamespace=”…”, xmlns=”…” and xmlns:tns=”…” ? Can I declare several prefixes for the same namespace? Can I change the default namespace from within a document? What happens if I import a schema and rebind it to another namespace? How do I reference an element unambiguously? Ever wondered how to really create a QName correctly? Ever wondered what happens if you have a cycle in your dependencies?

2 – Encoding and CDATA

XML encoding and file encoding are not the same. This is a huge source of troubles. Both encoding must match, and the XML file should be read and parsed according to the encoding specified in the XML header. Depending on the encoding, characters will be serialized in a different way, again a huge source of confusion. If the reader or writer of an XML document behave incorrectly, the document can be dangerously corrupted and information can be lost. Editors don’t necessary display the characters correctly, while the document may be right. Ever got a ? or ¿ in your text? Ever made a distinction between & and & ? Ever wondered whether a CDATA section was necessary or if using UTF-8 would be ok? Ever realized that < and > can be used as-is in attributes but need an encoding within a tag?

3 – Entities and DOCTYPE

Somehow relates to #2, but not only. XML entities are a generic way to define variables and are declared in the DOCTYPE. You can define custom entities; this is rather unusual but still need to be supported. Entites can be internal or external to your XML document, in which case the entity resolving might differ. Because entities are also used to escape special character, you can not consider this as an advanced feature that you won’t use. XML entities needs to be handled with care and is always a source of trouble. For instance, the tag <my-tag>hello&world</my-tag> will trigger 3 characters(...) events with SAX.

4 – Naming convention

Ever wondered whether it was actually better to name your tag <my-tag/>, <myTag/> or <MyTag/>? The same goes for attributes….

5 – Null, empty string and white spaces

Making the difference between null and empty string with XML is always painful. Null would be represented by the absence of the tag or attribute, whereas empty string would be represented with an empty tag or empty attribute. The same problem appears if you want to distinguish empty list and no list at all. If not considered clearly upfront (which is frequently the case), it can be very hard to retrofit clearly this distinction in an application.
Whitespace is another issue on its own. The way tabs, spaces, carriage return, line feeds are processed is always confusing. There are some options to control that, but it’s way too complicated for most of the usage. As a consequence, sometimes these special characters will be encoding in entities, sometimes embedded in CDATA and sometimes stores as-is in the XML.

6 – Normalization

XML encryption and signature look fine on paper. But as soon as you dig in the spec, you realize that it’s not so easy because of the syntactic and semantic equivalence of XML document. Is <my-tag></my-tag> the same as <my-tag/>? To solve this issue, XML normalization was introduced which define the canonical representation of a document. Good luck to understand all the subtleties when considering remarks #1, #2, #3 and #5.

7 – Too many API and implementations

Even if stuffs improved in this area, there are too many API and implementation available. I wish there was one unified API and one single implementation sometimes…Ever wondered how to select a specific implementation? Ever got a classloader issue due to an XML library? Ever got confused whether StAX was actually really better than SAX to read XML documents?

8 – Implementation options

Most XML implementations have options or features to deal with the subtleties I just describe. This is especially true for namespace handling. As a consequence, you code may work on one implementation but not on another. For instance, startDocument should be used to start an XML document and deal with namespace correctly. The strictness of the implementations differs, so don’t take for granted that portability is 100%.

9 – Pretty printing

There are so many API and frameworks that it’s always a mess to deal with pretty printing, if supported by the framework.

10 – Security

XML was not designed for security. Notorious problems are: dangerous framework extension, XML bomb, outbound connection to access remote schema, extensive memory consumption, and many more problems documented in this excellent article from MISC. As a consequence, XML document can be easily abused to disrupt the system.

11 – XPath and XSLT

XPath and XSLT belong to the XML ecosystem and suffer the same problems as XML itself: apparent simplicity but internal complexity. I won’t speak here about everything else that surrounds XML and that forms the big picture of the XML family specifications. I will just say that I recently got a NPE in NetBeans because “/wsa:MessageID” was not ok but using “/wsa:MessageID/.” was just fine. Got the point?

OpenESB: Invoke an Asynchronous Web Service

I was contacted last week to know if I had actually integrated an asynchronous web service in OpenESB, as promised in a previous post. The NetBeans SOA package is sometimes a bit obscure, though there are some explanation about the examples. I took a bit of time to dig this out, and here is then the promised follow-up (except that I won’t use WS-Addressing). I will use

OpenESB bundled with Glassfish
NetBeans to author the BPEL process
SoapUI to test the process

What we want to get

The BPEL process that will be created is a synchronous BPEL process, which calls an asynchronous web service using a correlation set to “resume” the process when the asynchronous response is received. The scenario is not very realistic – a BPEL process that calls an asynchronous WS will itself be asynchronous most of the time. The asynchronous WS may indeed take arbitrary long to respond; the client of the BPEL process would probably time out in this case. This example suffices however to show the underlying principles.

The BPEL process is synchronous
But it calls an asynchronous WS service
We use correlation-set for request/response matching

The BPEL process that we want to obtain at the end is shown below:

Create the PartnerLinks

One or two PartnerLinks?

Communication to and from the asynchronous web service can be realized using a single partner link with two ports or using two partner links with one port each.
From point of view of BPEL an asynchronous request/response is indeed no more than a pair of one-way messages. The request/response matching will anyway be done using correlation set.

As a consequence, the messages can come from “anywhere” and there is therefore not need to have one single partner link. I found it easier to have 2 partner links so that all ports on the left side are the one exposed by the process, and all ports on the right side are the one consumed by the process.

WSDL with one-way PartnerLink

PartnerLinks can be defined in the BPEL process or in the WSDL of the web service. NetBeans has a nice feature to create the PartnerLink in a WSDL therefore I chose to define them there.

A one-way web service is a web service which defines only <input> or <output>. I therefore took the WSDL of my previous post and simply removed the <output> tags so that they become one-way web service. (I also removed anything related to WS-Addressing as it’s not used here).

The PartnerLink can then easily be created with NetBeans using the “Partner” view in the WSDL. The two WSDLs then looked like this:

Create the BPEL process

Add the PartnerLink

Now that the WSDL files of the asynchronous web services are ready, I create a new BPEL process. I then added the following PartnerLinks:

AsynchronousSampleClient from the SOA sample bundled with NetBeans
AsyncTestImplService created previously
AsyncTestResponseImplService create previously

Wire the request/response

Then I wired the request/response as follows. I relied on NetBeans variable creation for each <invoke> or <receive> activity. I therefore got the following variables:

ResponseIn
SayHelloIn
OperationAIn
OperationAOut

Assign the variables

For the purpose of this example I assign the variable between the message like follows. Note that this example make no sense from a business point of view.

Define the correlation set

A receive activity within the process should be assigned a correlation set. The BPEL engine is otherwise unable to match the request/response and resume the right process instance.

I defined a correlation set “correlation” which would use the property “correlationProp”. A correlation property is a value that existing in different message and that can be used to match messages together. The property itself is a “symbolic” name for the value, and the corresponding element in each message is defined using so-called property aliases.
I then added two aliases, one in each WSDL file, and defined how “correlationProp” would map in the “sayHello” and “response” message respectively.

The process can then be built without warnings.

Deployment

The endpoint ports

The process defines 3 ports that can be changed according to your need. In this example the expected endpoints are:

The corresponding WSDL can be obtain by appending “?wsdl” in the URL.

Note that the address for the asynchronous callback is not passed as parameter from the BPEL process, but should be hard-coded in the web service implementation. It would however be very easy to pass the callback address as an extra parameter so that the asynchronous web service is entirely decoupled of the BPEL process.

Build

Rebuild the process to take the latest change in the port URL.

Create the composite application (CA)

The BPEL process cannot be deployed as-is. You will need to embed the BPEL process into a composite application, which is a deployable unit. This is very easy to do:

Create a new project of type composite application.
Drag and drop the BPEL project onto the Service Assembly
Rebuild the composite application

All that is necessary will be created automatically during the build. After the build is complete, NetBeans will refresh the Service Assembly and it looks then like this:

Deploy

Go in the Glassfish console and deploy the service assembly produced in the previous step.

Import WSDL in SoapUI

Start SoapUI and import the 3 WSDL.

Mock the asynchronous web service

Now that the 3 WSDL have been imported, we will create a mock for the asynchronous web service. This way we can verify if the BPEL process call the asynchronous web service correctly and we can send the callback response manually.

Select the WSDL “AsyncTestImplPortBinding”, and right-click “Generate Mock Service”. Make sure to use

path = /AsyncTestImplService/AsyncTestImpl?*
port = 8888

So that it matches the port that the BPEL process will use.

Make sure to start the Mock, in which case SaopUI displays “running on port 8888” at the top-right of the Mock window. The project looks like this:

Test

1 – Invoke BPEL process

Send the following SOAP message to the BPEL process (located at http://localhost:18182/AsynchronousSampleClient):
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:asy="http://enterprise.netbeans.org/bpel/AsynchronousSampleSchemaNamespace"> <soapenv:Header/> <soapenv:Body> <asy:typeA> <paramA>dummy</paramA> <id>123</id> </asy:typeA> </soapenv:Body> </soapenv:Envelope>

2 – Receive the asynchronous invocation

When the Mock service the asynchronous message it displays something like “[sayHello] 4ms”. The message can be opened and should look like:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <sayHello xmlns:msgns="http://ewe.org/" xmlns="http://ewe.org/"> <arg0 xmlns="">123</arg0> </sayHello> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

3 – Send the callback manually

We simulate manually the behavior of the mock service and send the following message to the callback endpoint (http://localhost:18182/AsynchronousSampleClient/response):

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ewe="http://ewe.org/"> <soapenv:Header/> <soapenv:Body> <ewe:response>  <arg0>123</arg0> </ewe:response> </soapenv:Body> </soapenv:Envelope>

4 – Get the synchronous reply

So far, the SOAP request of step #1 was still waiting to receive the synchronous response. After the callback has been sent, the BPEL engine should resume the right instance of the BPEL process (using the correlation value “123”), which should then terminate.

SoapUI will display the time taken for the request/response which will then be something like “response time: 5734 ms”. The time will of course depend on how long you took to perform step 2 and 3. (Note that after some time, the request will timeout if you really take too long to do these steps.)
The SOAP response message should look like:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <typeA xmlns:msgns="http://enterprise.netbeans.org/bpel/AsynchronousSampleClient" xmlns="http://enterprise.netbeans.org/bpel/AsynchronousSampleSchemaNamespace"> <id xmlns="">123</id> </typeA> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Conclusion

This example as-is make little sense from a technical and business point of view; I wish I had also used more meaningul names for the various elements. It however shows the principle of asynchronous web service invocation using OpenESB. The adaption of this example for meaningful use cases should be relatively simple. It’s a matter of changing the message types and assignment rules.

Taming Size and Complexity

The only real problem with modern software is size and complexity. If we had a bigger brain able to apprehend and reason about software as a whole without omitting details, we wouldn’t have that many issues. Unfortunately, our mental abilities are limited, and as a consequence, we need to have ways to build software whose complexity is beyond our own analytical power. The same is true for any large scale engineering initiative. Apart from discipline, which is a prerequisite to manage size and complexity, traditional ways to address size & complexity are: abstraction, automation and intuition.

Abstraction

The traditional way to address complexity is to raise the abstraction level. Get rid of details and stay focused on the essential – complexity goes away. You can then reason on different parts at various abstraction levels independently of each other. This is the ground-breaking argument about any modeling effort or methodology. The problem is that the whole is not equal to the sum of its parts. Unforeseen interactions will emerge resulting in a myriad of potential problems. An other major problem is the traceability of the different parts.

Automation

The traditional way to address size is through automation. A lot of task that we perform are not tedious due to their very nature, but due to the effort they demand. Our concentration is also limited which implies we will make mistakes. Automation leads then not only to higher productivity but higher quality. There are too many examples of automated task, but code formatting and refactoring fall for instance into this category. Even though automation is extremely effective for specific task, automation is also impacted by the complexity of the software to produce. State explosion is for instance one of the main problems of a technique such as symbolic execution.

Intuition

Actually, problem solving implies not only strong analytical skills but also some form of intuition. The same goes with software and program understanding. Software exploration and visualization are powerful techniques to reason about abstract information in an intuitive way. Software is intangible and has by consequence no natural representation – this leaves the door open for new visualization metaphors. Examples of interactive visual development technologies are BPEL workflow, DSM, or polymetric views.

A Simple Categorization of Unit Tests

Unit testing has become incredibly popular during the past years. As a consequence I sometimes feel like we’ve lost the focus on why we write unit test and what we expect as a return on investment.

The success of unit testing probably comes from the simplicity of the appraoch. However, we’ve learned since then that unit testing is not so easy neither. A high percentage of code coverage does not necessary mean quality software, doeasn’t mean that border cases have been covered, and can even impede software maintenance if the test suite is poorly organized.

Don’t get me wrong, I do value the benefit of unit testing. But unit testing for the sake of unit testing has no value to me. If you think an other form of automated testing will be a most rewarding strategy, then do it. If you go for unit tests, the important questions to ask are:

Did the unit tests revealed bug in the production code?
Did you organise the unit test in a meaningful way?
Did you spend time to identify and test border cases?

A clear “yes” to these three questions indicates an intelligent and probably rewarding testing effort. A pertinent test suite should reveal the bugs that you introduce in the system — and don’t pretend you don’t introduce any. A well-organized test suite should give you a high-level view of the features in the module. A well-crafted test suite should cover the nasty use cases of the system before they hurt you when the system is in production.

There is an abundant literature about unit testing, but nothing that I read seemed to cover the reality of the unit tests that I write. I therefore analysed the nature of my own unit tests and came with a personal categorization which differs from existing one. Here is the 4 categories I’ve identified as well as a short description of the pro/cons of each kind.

Basic unit tests

A basic unit test is a test suite which covers one unique class and test each method in a more-or-less individual fashion. This is the core idea of unit test where each tiny functionality is tested in full isolation, possible with the help of mock objects to break the dependencies. As a result, the test should be repeatable and also independent (of the environment and of other modules).

Problem with real unit tests are:

Testing micro-functionality is perceived as boring, exhaustive or laborious, especially if mock objects are over-used
Tests are tightly coupled with the implementation details and need frequent refactoring
Fixture overlap and dependency between tests leads to code duplication
Relevance of such test can be discussed as the coupling between low-level and high-level fault is only partially understood.

Unit test with indirect assertions

For basic unit tests, the subject under test (SUT) is the very one for which we assert the behavior. We perform an action on the SUT and ensures it behaves correctly. This is however sometimes not possible as soon as the system become a bit more complicated. As a consequence, the actions are performed on the SUT, but we rely on a level of indirection for the assertions; we then assert the behavior of another object than the SUT. This is for instance the case when we mock the database and want to ensure that the rows are correctly altered.

Coupling with the implementation is still high
Fake objects migth be used instead of Mocks — they contain state and aren’t purely hard-code anymore
Behaviour of the SUT is harder to understand due to the level of indirection

Inflection point unit test

Inflection points — a term coined by Michael Feather if I’m right — are somehow the entry points to a given software module. For utility libraries or services, the inflection points correspond to the public API. Testing these specific points is the most rewarding strategy to me, and other people think the same.

Inflection points are less subject to changes, and are closer to a black-box form of testing
Tests become the first client of the interface and give you a change to check if the API is practical
After having covered all the common use cases of the inflection point, the test coverage of the underlying classes should be close to 100%. If not, this indicates a potential design weakness or useless code.

Dynamic unit tests

I qualify tests as “dynamic” when their execution change from one run to the other. Primary goal of such test is to simulate the dynamicity and variability of the productive system. Such test are however quite far away from the concept of basic unit tests and could be considered to some extend as integration tests; they are however still repeatable and independent of other modules. Execution in the real system may indeed be aftected by either

Threading issues
Contextual data, e.g. cache
Nature of the input

Most of these tests rely on randomization, for instance to generate input data or disrupt the scheduling of threads. Though it’s more complicated, randomization and fuzzing have been proved as effective techniques to detect issues which would never arise with fixed execution condition. Think for instance about phase of the moon bugs and date problems.

Abstraction-Level vs. Meta-Level

As per wikipedia, a model is a pattern, plan, representation (especially in miniature), or description designed to show the main object or workings of an object, system, or concept. In the case of software engineering, models are used to represent specific aspects of the software system, such as static structures, communication paths, algorithms, etc.

Models can be created along two axis:

Abstraction level

An abstraction level is a way of hiding the implementation of a particular set of functionality. Both representations are however two descriptions of the same reality. The representation can vary in their level of detail or in their expressiveness. Example

C code – assembler code.
Both representation describe a sequence of operation in an imperative way, but C is much more expressive than assembler.

Java code – sequence diagram.
Both representations describe the run-time behavior of the system. The sequence diagram does however frequently omit details of the call flow.

Meta-level

A meta-level (and meta-model) is way to highlight the properties – the nature – of a particular set of elements. A meta-model describes then the constraints or the semantics to any set of such elements.

Object – class.
In OO technologies, object and classes belong to two different meta-level. The class is a representation of the structure (constraints) that any of the instance will fulfill.

XML – DTD.
An XML file can be validated against a DTD, which is not a representation of the particular file, but of the structure (constraints) that the file must fulfill.

Tower of models

Models themselves can be refined towards either higher abstraction level or meta-levels.

Component diagram – class diagram – Java code.
All three artifacts are representations of the same reality at various level of refinement and reasoning.

XML – Schema – Meta schema.
In this case, each model is meta description of the underlying one. XML Schema themselves must conform to the schema http://www.w3.org/2001/XMLSchema.

As with the XML example, the tower of meta-model generally ends up in model specified in itself. That is, a recursion is introduced to stop the tower of meta-model from growing infinitely.

The term “modeling” refers commonly to “raising the abstraction level”; the term “meta-modelling” is used otherwise. Different tools/technologies don’t necessary represent different meta-levels. Inversly, one technology may be used to address different meta-level.

With respect to OO technologies, several UML diagrams and notations work at different meta-level. Object diagram and sequence diagrams describe the interaction between specific instances of objects. Class diagram specifies constraints (such as cardinality) or semantical relationship between objects (such as aggregation). Stereotypes can then be used to specifies constraints (such as “singleton”) or semantics (such as “entity”) to the class itself. For instance, an “entity” stereotype could be defined to flag all classes with an attribute “primary key” of type integer.

Level \ Technology	Language	UML
L0	Object	Object diagram, Sequence diagram
L1	Class	Class diagram
L2	–	Stereotype

Examples where meta-models are useful

Having three meta-level provides one extra level of indirection that can be very practical in some cases:

Level	XML transformation	OO refactoring	Annotation processing
L0 – information	Data	Object	Class annotation (e.g. @Entity)
L1 – model	Schema	Class	Annotation processor (e.g. JPA)
L2 – meta-model	Meta schema	Metaclass	Classloader

Here are three common examples which actually leverage the strength of meta-model.

XML transformation

The success of XML is related to the existence of a meta-model which makes it a semi-structured format. Structured data are interpretable only by their host application which is the only one to know the model. Unstructured data (plain text, image, etc.) have not model and are not interpretable at all. Semi-structured data have a model which is itself understandable by any other application which knows the meta-model. Due to this extra-level, reusable parser, validator, encryptor or even arbitrary transformer (think XSLT) could be created. The schema could also be extended without necessary breaking backward compatibility (think XPath).

OO refactoring

Refactoring rules is a typical example of the usage of the OO meta-model: refactoring rules are proved to preserve the validity of the model against the meta-model, and are reusable for any model.

Annotation processing

Annotation processing is yet another example where a meta-model is used under the hood. Just like the class-loader knows how to load a class because of the meta-model of any class, it is also able to load the corresponding annotations. This reliefs the application from storing application-specific semantics in separate files (think deployment descriptor). The application (annotation processor) can then query the annotation at run-time to refine the behavior of the system. The annotations enrich classes with additional constraints and behaviour in a resuable way: the same annotation processor can be used for arbitrary class sets.

Pharo By Example

This is an introduction book to Smalltalk and the Pharo development environment. The book is split in two parts: the first one covers the Smalltalk language, the Pharo IDE, the Morphic package for GUI application and the Seaside framework for web application. The second part is relatively thin and covers more advanced topics such as reflection.

The book is easy to read, with many examples to follow, and the reader will quickly acquire a good comprehension of Smalltalk. The object-oriented concepts specific to Smalltalk are however rigorously discussed. A good example would be the paragraph about class variables in case of a class inheritance: it’s easy to follow in the book but still an OO concept that could be otherwise misunderstood. These paragraphs are the most important one for people (like me) coming from other OO languages.

The GUI and web frameworks are briefly covered but enough to get a idea of how to they work. Both are component-based, and a concrete component is built from A to Z in both cases to get insight about the frameworks.

The second part about reflection is where the beauty of Smalltalk shines. First the concept behing meta-object and reflection are presented and then a few useful patterns which use reflection are described, for instance how to build a dynamic proxy in Smalltalk. This is clearly the most interesting part, where the dynamic nature of Smalltalk can be exploited in a constructive way.

Software Evolution

Software evolution studies software engineering under the perspective of software maintenance and change management. The term was coined in the 70 after the seminal work of Lehman and his laws of software evolution.

As such, software evolution is a topic as broad as software engineering itself – any aspect of software engineering (being a process, methodology, tool, or technique) can indeed be studied under the perspective of software evolution.

The book is a comprehensive overview of the current body of knowledge in software evolution. It contains 10 chapters addressing each one a specific field of software engineering under the perspective of software evolution. The articles present recent works and open challenges; the book can then be seen as comprehensive survey as well as a research agenda. The book bibliography is impressive and more than 500 articles are referenced throughout the 10 chapters.

Apart from this selection of articles, the book has an excellent introduction the field of software evolution, and a set of useful appendices pointing to additional resources in this field. The 10 chapters are organized into 3 parts:

Part I: Program understanding and analysis

Identifying and Removing Software Clones
Analysing Software Repositories to Understand Software Evolution
Predicting Bugs from History

Part II: Re-engineering

Object-Oriented Reengineering
Migration of Legacy Information Systems
Architectural Transformations: From Legacy to Three-Tier and Services

Part III: Novel trends in software evolution

On the Interplay Between Software Testing and Evolution and its Effect on Program Comprehension
Evolution Issues in Aspect-Oriented Programming
Software Architecture Evolution
Empirical Studies of Open Source Evolution

Chapters I enjoyed the most were “Object-oriented reengineering”, “On the interplay between…” and “Evolution Issues in AOP”.

I wish there was a chapter about evolutionary issues in model-driven engineering, which is an important area of research. I therefore would recommend “Model-Driven Software Evolution: A Research Agenda” as a complement to this book.

The book is accessible to non-expert and I learned a lot while reading it. The book is definitively worth looking at for anyone interested to understand what “software evolution” is all about.

Quotes

… mostly on software engineering, but not only.

“How does a project get to be a year late?…One day at a time.” — The Mytical Man Month

“Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test. […] Each of us should […] explore and embrace techniques to reduce complexity.” — Ray Ozzie, CTO, Microsoft

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.” — Saint-Exupéry

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” — D. Knuth

“Beware of bugs in the above code; I have only proved it correct, not tried it” — D. Knuth

“Research is less about discovering the fantastic, as it is about revealing the obvious. The obvious is always there. It needs no discovery. It only needs us to take a look from a different perspective to see it.” — On T. Girba’s blog

“It is impossible for any number which is a power greater than the second to be written as a sum of two like powers. I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain.” — P. Fermat

“These are my principles. If you don’t like them, I have others.” — Groucho Marx

“if all you have is a hammer, everything looks like a nail…” — B. Baruch

“Discipline is the bridge between goals and accomplishment.” — Jim Rohn.

“If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.” — Woodrow Wilson

“The devil is in the details” — German proverb quote

“The first idea is always wrong” — Popular wisdom

“What can go wrong will go wrong” — Murphy’s law

“The nice thing about standards is that there are so many of them to choose from.” — Andrew S. Tanenbaum

“The manager asks how and when; the leader asks what and why.” — Warren Bennis

“Experience is what you get when you don’t get what you want.” — Dan Stanford

“A mission is something that you do in a bounded amount of time with bounded resources. A purpose is something that you never achieve.” — William Cook

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” — Martin Fowler

“The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.” — E.W. Dijkstra, The Humble Programmer

“There are wavelengths that people cannot see, there are sounds that people cannot hear, and maybe computers have thoughts that people cannot think” — Richard Hamming

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” — Brian Kernighan

“If others would think as hard as I did, then they would get similar results.” — Newton

“Rules of thumb are sometimes what their acronym infers…ROT” — unknown

“Any intelligent fool can make things bigger and more complex. It takes a touch of genius – and a lot of courage – to move in the opposite direction.” — Albert Einstein

“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to mean — neither more nor less.” — Lewis Caroll, Through the Looking Glass

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” — Tony Hoare

“He who hasn’t hacked assembly language as a youth has no heart. He who does as an adult has no brain.” — John Moore

“If you can’t explain it simply, you don’t understand it well enough” — A. Einstein

“There are two groups of languages: those which everyone hates and those which nobody uses.” — Unknown

“Any problem in computer science can be solved with another layer of indirection” — David Wheeler

“Idiomatic usage often matters more than what is strictly possible.” — A comment from Stuart Halloway

“You learn lisp for the same reason you learn latin. It won’t get you a job but it will improve your mind” — Paul Graham

“The Internet is the world’s largest library. It’s just that all the books are on the floor.” —John Allen Paulos

“There are only two hard things in Computer Science: cache invalidation and naming things” — Phil Karlton

“Some abstraction is great, but eventually you need to write code that does something useful.” — Bill Karwin

“Tell me, I forget. Show me, I remember. Involve me, I understand.” — Chinese proverb.

“Don’t reinvent the wheel unless you want to learn about wheels.” — Popular wisdom

“I’m sorry this letter is so long, I didn’t have time to write a shorter one” — Mark Twain

“To be trusted is a greater compliement than to be loved” — George Mcdonald

“Most people work just hard enough not to get fired and get paid just enough money not to quit” — George Carlin

“Find your obsession, make it your profession, and you’ll never work a day in your life” — Unknown

“Plans are nothing; planning is everything.” – Dwight D. Eisenhower

“If you set your goals ridiculously high and it’s a failure, you will fail above everyone else’s success.” — James Cameron

“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.” — Robert A. Heinlein

“Tools can enable change in behavior and eventually change culture” — Patrick Debois

“First you learn the value of abstraction, then you learn the cost of abstraction, then you’re ready to engineer.” — @KentBeck

“Simplicity does not precede complexity, but follows it.” — Alan Perlis

“The most important thing in communication is hearing what isn’t said.” – Peter F. Drucker

“Computer science is now about systems. It hasn’t been about algorithms since the 1960’s.” – Alan Kay

“[Some people] just can’t wrap their head around that someone might build something because they like building things.” -– Mark Zuckerberg in an inteview

“So much complexity in software comes from trying to make one thing do two things.” – Ryan Singer