I discussed in my previous post the fact that we need a better support of immutability in object-oriented language. The problem is that it’s no so easy to add because the object paradigm is rooted in the concept of an object being an entity whose state is normally mutable. I discuss here about one of its cousin: object equality.
Some objects are more equal than others
First a quick recap. Object equality is an equivalence relation traditionally implemented through equals
and hashCode
. Here is the corresponding Javadoc:
The equals method implements an equivalence relation on non-null object references:
• It is reflexive: for any non-null reference value x, x.equals(x) should return true.
• It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
• It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
• It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
• For any non-null reference value x, x.equals(null) should return false.
The definition implicitly allows mutable state to be used for equality. This is however dangerous, and equals
should be defined only based on immutable state when possible.
This is notably a requirement for Collections
, which require a stronger equality contract. If the object equality changes while it is in the collection, the behavior of the collection may not be consistent. See pitfall #3 in How to write equality in Java.
The simplest solution to this problem is to make object equality immutable, that is, the fields participating in the equivalence relation are final
– the equality can never change post-construction.
This is however not always possible, especially if the object requires an initialization in a post-construction phase. If the object equality depends on related objects, a strictly immutable object might still see its hashCode
change if one of the depending object is mutated. This leads to the concept of ownership, where object that are owned should also be immutable. Pure ownership is however an object-level/run-time concern which is not easy to ensure (class-based ownership is possible but more limited).
As presented in “Understanding the Impact of Collection Contracts on Design”, we should then consider (i) the construction of object (ii) the mutability of the object state and (iii) the “deepness” of the equivalence relationship to reason about object equality.
We have then actually three types of fields:
– final
. Never change post-construction and referenced object is immutable
– eventually final
. Can change post-construction but will be frozen in a point in time
– mutable
. Can be mutated anytime.
Object could be frozen post-creation as is proposed in “Flexible immutability with frozen objects”. The equivalence relation could use only final
and eventually final
fields. Owned object could be modified only through the parent object, as to ensure that the equality contract is never broken.
There is no one notion of equivalence
The problem remains that there is not “correct” implementation of object equality. It mostly depends on the usage. You may want to compare list based on reference identity, but also based on their content sometimes. Should we then have a notion of “deepness” right into the equals
operator, in a way similar to the variants of shallow and deep cloning?
aList.deepEquals( anotherList )
Well, that’s actually what already exists with OrderedCollection
, where you specify the Comparator
to be used. Is it the solution? Should we remove the object equality form the object and move it the client of the object?
Set s = new HashSet( new ReferenceEqualityContract() );
Or should we be able to instantiate an object and specify which equality contract (or more generally a comparison contract) it should fulfill?
In this case, an object is limited to have only one equivalence relation at a time. (See this post for a Haskell sample of how to do it)
One could specify the behavior of the object at creation time and see that as an object-level behavioral variation. The types of two objects with different equivalence relation should however be different to ensure the relation is always symmetric (objects with different equivalence relation are always different). This means that the each specific variation would correspond to a type, which inherits from a common type.
– Is object equality as subset of the problem of object comparison? Or are they fundamentally different? State used for comparison shouldn’t change while object is in collection, but it’s no necessary part of the “primary key”…
– Should an object have only one equivalence or comparison definition at a time? Or could an object have several ones at the same time? Or one equivalence relation but several comparison definitions at a time? (We could easily imagine two lists with the same objects, but one list storing the object in ascending order and the other one in descending order)
The fact is that in a pure object world we should uniquely compare object by reference. There should be no two objects with the same “primary key”. In “Declarative object identity using relation type“, the authors introduced the notion of scope as to create and obtain the same reference for a given key. The risk then is to alter objects in an unexpected way and break other object invariants (dangling alias, etc.). It is simply impractical. The world is constantly evolving, and we are forces sometimes to have two objects for the same entity: one for the old and one for the new entity (See this other post on persistent data structure). Such objects somehow relates together and are not completely “different”, hence the notion of object equality.
But this little digression still doesn’t solve the issue that whatever the equality contract that the object must fulfill, we cannot ensure that it will be the case.
And it gets worse with inheritance
“There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.” — Effective Java
Indeed if we have a Point
class and a ColorPoint
class, the equivalence relation may be broken if ColorPoint
doesn’t redefine equals
and hashCode
. See pitfall #4 in How to write equality in Java. To enforce the equality contract, two objects may be equal only if they have the same type.
The problem is that it is too restrictive. A context that uses color points, such as a graphic editor, may want to share the points with a context that doesn’t have a notion of color, such as 2D algorithm. The graphic editor would test the equivalence of two figures according to the color as well. And the 2D algorithm would test the equivalence of two points according to their position solely, in a way to prevent division by zero.
Also, once an object has overridden the default equals
and hashCode
which implements reference equality, it is also impossible for an object to fall back to this mode. As a consequence we may end up in a situation with subclasses whose equivalence relation can’t be expressed. In such case, it should be forbidden to compare objects.
Should we then re-think object equality with ternary logic, so as to be able to return true
, false
and N/A
?
EDIT
Here I will gather other links related to the subjects
- API Design: indentity and equality
- Whose object is it, anyway?
- Redesigning System.Object/java.lang.Object
- You are what you is: defining object identity
- Equals implementation in NHibernate, proxy question
- == vs === and type coercion
- What does Barbara Liskov have to say about Equality in Java?
- Grant Matcher Puzzle
- The left hand of equals
- Keith Cirkel and Allen Wirfs-Brock on deepCloning