Writing Immutable Objects with Elegance

Edit: This blog post has been turned into a draft paper.

Today, I got burned with design issues of Smalltalk’s dictionaries. They implement object comparison = using value equality, but are at the same time mutable. The internal state of a dictionary consists of an array of associations. In addition to problems of equality, the the internal state can be accessed and aliased with Dictionary>>associationsDo: and Dictionary>>add:

We should instead have an immutable dictionary using value comparison, and a mutable dictionary using identity comparision. Mixing styles is too dangerous. For the mutable dictionary, the internal state should never be accessible and data should be copied to ensure no dependencies on mutable state are established.

In an old post I rant on the conflict between the imperative object paradigm that promotes mutable state, and the functional paradigm that promotes immutable data structures. It is mostly a problem of readability, though.

Assignment Syntax

To make the use of immutable objects more readable, we could actually generalize operator assignment syntax like +=, -=, <<= to any message. Postfixing a message with = would imply that the reference pointed to the receiver of the message is updated with the value of the message.

aReference message=: 5            
<-->      
aReference := aReference message: 5
aDictionray at: key put=: value   
<-->      
aDictionary := aDictionary at: key put=: value

When the selector has multiple arguments, the prefix comes at the very end. This is simple, efficient syntactic suggar.

What about self-sends? Accordind to the previous examples, self would be updated with the value of message. While it might sound counter-intuitive or plain absurd, this is what we need:

Point>>moveX: offsetX y: offsetY
self x=: self x + offsetX. 
self y=: self y + offsetY.
^ self

Assuming #x: and #y: return copies of the objects, wihth this semantics, the reference to self on the last line corresponds to the last copy created.

The only difference between self-sends and regular sends is the implicit contract that is assumed for the method. In the regular case, the message send can return any object. Any invocation of the returned object will happen with a regular send that will in the worst case raise a message not understood exception. For self-sends to work as in the example above, messages #x: and #y: must return instances of Point, so that the activation frame can be updated correctly. Updating the activation frame rebinds self to the new object, but preserves the temporary variables.

(I believe this would have an incidence on closures. More investigations are needed. The precise semantics could maybe be simulated with continuations)

Copy syntax

The previous proposal still leaves the burden of copying the objects to the developers. In the previous examples, #x: , #y: and #at:put: would need to first clone the receiver (self), then update it.

Point>>x: newX
^ ( self clone ) basicX: newX ; yourself.

Ugly, right? Following a similar syntactic approaches, message sends could be prefixed with % to indicate that the message must be delivered to a clone of the receiver:

aReference %message: 5 <--> aReference clone message: 5

We know that cloning is broken. However, it is not the subject of this post, so we will assume that we have a reasonable implementation of #clone. With % and = we have all ingredient to implement immutable structures easily.

Point>>x: newX
  self basicX: newX

Point>>moveX: offsetX y: offsetY
self %x=: self x + offsetX. 
self %y=: self y + offsetY.
^ self

(The accessor Point>>x is actually superfluous, since it is similar to basicX. It serves as example only.)

For an even more concise syntax, a third prefix/postfix could be introduced.

aReference ~message: 5 <--> aReference %message=: 5

Nested Objects

The proposed syntactic suggar has limited benefits for more complex mutation of objects with nested objects. Let’s consider an immutable circle with an immutable point as center.

Circle>>moveX: offsetX y: offsetY
   self ~center: (self center %moveX: offsetX y: offsetY )
  ^ self

But what we would really like to write is

Circle>>moveX: offsetX y: offsetY
   self center ~moveX: offsetX y: offsetY
  ^ self

Handling this situation so that the receiver “self center” is replaced with the new point, implies first the replacement of “self” with a new circle. The replacement of the receiver “self center” (that is not an L-value) could be achieved if by convention the corresponding setter is used. The above code would then execute implicitely “self ~center: xxx” to replace “self center”. This corresponds to the intended behavior. In other words,

self a ~m: args <--> self ~a: (self a %m: args)
self a b ~m: args <--> self ~a: (self a %b: (self a b %m: args))
etc.

The ~ can appear only before the last message send. The statement “self a ~b m: args” would be ill-defined.

More Links

Transformation for Class Immutability

Assignment Syntax

Leave a comment Cancel reply