Implementation of Semaphores

For the need of the experimental Pinocchio research VM, we needed to add support for threading and concurrency. We implemented green threads, a la Squeak and there is then no “real” multi-core concurrency going on. The VM relies on AST interpretation, instead of bytecode. With green threads, the interpretation of an AST node can always be considered atomic: no two AST node can be interpreted concurrently. This is unlike Java and its memory model, where individual bytecodes can be interpreted concurrently, possibly with nasty side-effects (e.g. manipulation of long is not atomic). Thread preemption can happen anytime between AST nodes evaluation.

How can we add support for semaphores?

The pharo design

The pharo design can be informally summarize like this: when a semaphore is instantiated, its counter is set to one. Whenever a block needs to be evaluated in an exclusive way, the counter is checked. If the counter > 0, it is decreased and the block is evaluated. If the counter = 0, the thread is suspended an added to the list of threads currently waiting on this semaphore. When the critical block has been evaluated, the list of suspended threads is checked. If there are not suspended threads, the counter is incremented to 1. Otherwise, one of the suspended thread is picked and resumed.

This implementation explains why Semaphore extends LinkedList. I’m not sure it’s the best design decision, because it’s not conceptually a list and the list protocol should not be exposed by a semaphore. It uses inheritance for implementation reuse, but composition would have been just fine here if the semaphore was internally holding a linked list (and maybe use a pluggable ownership type system to check that the list does not leak out of the semaphore…).

Also, semaphores must be created using the forMutualExclusion factory method. This method instantiates and initialize the semaphore to allow exactly one execution at a time (hence the term mutual exclusion), but nothing would prevent you from initializing the semaphore so that up to N blocks can be executed concurrently.

The respective code for wait and signal (which respectively decrement and increment the counter) are:

wait
 excessSignals>0
 ifTrue: [excessSignals := excessSignals-1]
 ifFalse: [self addLastLink: Processor activeProcess suspend]
signal
 self isEmpty
 ifTrue: [excessSignals := excessSignals+1]
 ifFalse: [Processor resume: self removeFirstLink]

They are however implemented as primitives. I suspect this is not for performance reason, but for the sake of concurrency correctness. These operation themselves need to be atomic. Implemented in Smalltalk, the thread could be preempted during one of these, breaking the semaphore’s design.

The test-and-set design

These two methods and the internal counter suggest that an implementation relying on more generic concurrency primitive is possible. Typical concurrency primitives for this are test-and-set or compare-and-swap.
 We’ve added a primitive testAndSet to Boolean, and implemented the Semphore with busy waiting (also sometimes called spin lock):
 
 critical: aBlock
 | v |
 "we spin on the lock until we can enter the semaphore"
 [ lock testAndSet ] whileTrue: [ PThread current yield ].
 "we evaluate the block and make sure we reset the flag when we leave it"
 [ v := aBlock value. ] ensure: [ lock value: false ].
 ^ v.

The design could be improved to no use busy waiting. Instead of yielding, the thread would be suspended and added to a list. In the ensure block,  the flag would be reset and one of the thread would be resumed. The resumed thread would however still need to testAndSet the lock to prevent that no other thread has entered the semaphore in the meantime, possibly delaying the thread forever. So if fairness is required, this algorithm is not optimal.

The bakery design

You can also implement critical section without the support of other concurrency primitives. The most famous one is probably Lamport’s bakery algorithm:

What is significant about the bakery algorithm is that it implements mutual exclusion without relying on any lower-level mutual exclusion.  Assuming that reads and writes of a memory location are atomic actions, as previous mutual exclusion algorithms had done, is tantamount to assuming mutually exclusive access to the location.  So a mutual exclusion algorithm that assumes atomic reads and writes is assuming lower-level mutual exclusion.  Such an algorithm cannot really be said to solve the mutual exclusion problem.  Before the bakery algorithm, people believed that the mutual exclusion problem was unsolvable–that you could implement mutual exclusion only by using lower-level mutual exclusion.

In our case with green threads, read and write are atomic because they are single AST nodes, but this isn’t necessary the case.

Here ends this little trip into basic concurrency. There is a rich litterature on the topic — which is truely fantastic — and we might explore and implement more sophisicated abstractions later on.

References

The Java Memory Model
Thread Synchronization and Critical Section Problem
A New Solution of Dijkstra’s Concurrent Programming Problem

Anti-if Programming

If are bad, if are evil. If are against object-oriented programming. If defeats polymorphism — These popular remarks are easier said than enforced.

Ifs can pop up for various reasons, which would deserve a full study in order to build a decent taxonomy. Here is however a quick one:

  • Algorithmic if. An algorithmic if participate in an algorithm that is inherently procedural, where branching is required. No much can be done for these ones. Though they tend to increase the cyclomatic complexity, they are not the most evil. If the algorithm is inherently complex, so be it.
  • Polymorphic if. A class of object deserve some slightly different treatment each time. This is the polymorphic if. Following object oriented principles, the treatment should be pushed in the corresponding class, and voila! There are however million of reasons why we don’t want the corresponding logic to be in the class of the object to treat, defeating object oriented basics. In such case, the problem can be alleviated with visitor, decorator, or other composition techniques.
  • Strategic if. A single class deals with N different situations. The logic is still essentially the same, but there are slight different in each situation. This situation can be refactored with an interface, and several implementations. The implementations can inherit from each other, or use delegation to maximize reuse.
  • Dynamic if. Strategic if assumes that the situation doesn’t change for the lifetime of the object. If the behavior of the object needs to change dynamically, the situation becomes even more complicated. Chances are that attributes will be used to enable/disable certain behavior at run-time. Such if can be refactored with patterns such as decorators.
  • Null if. Test for nullity is so common that it deserves a special category, even though it could be seen as a special case of another category.  Null if can happen to test the termination of an algorithm, the non-existence of data, sanity check, etc. Various techniques exist to eradict such if depending on the situation: Null Object pattern, add as many method signature as required, introducing polymorphism, usage of assertions, etc.

A step-by-step Anti-If refactoring

Here is a step-by-step refactoring of a strategic if I came accross . I plan to send it to the anti-If compaign and took then the time to document it. The code comes from the txfs project.

Let’s start with the original code:

public void writeFile (String destFileName, InputStream data, boolean overwrite)
throws TxfsException
{
FileOutputStream fos = null;
BufferedOutputStream bos = null;
boolean isNew = false;
File f = new File (infos.getPath (), destFileName);
if ( !overwrite && f.exists () )
{
throw new TxfsException ("Error writing in file (file already exist):" +
destFileName);
}

try
{
if ( !f.exists () )
{
isNew = true;
}
DirectoryUtil.mkDirs (f.getParentFile ());
try
{
Copier.copy (data, f);
}
finally
{
if ( isNew && isInTransaction () )
{
addCreatedFile (destFileName);
}
IOUtils.closeInputStream (data);
}
}
catch ( IOException e )
{
throw new TxfsException (“Error writing in file:” + destFileName, e);
}
}

Not very straightforward, isn’t it? The logic is however quite simple: if the overwrite flag is set, file can be written even if it already exists, otherwise an exception must be thrown. In addition to that, if a transaction is active, the file must be added to the list of created file, so that they can be removed in the transaction is rolled back later.  The file must be added even if an exception occurs, for instance if the file was partially copied.

What happens is that we have two concerns: (1) the overwrite rule and (2) the transaction rule.

Let’s try to refactor that with inheritance. A base class implements the logic when there is no transaction. And a subclass refines it to support transactions.

public void writeFile (String destFileName, InputStream data, boolean overwrite)
throws TxfsException
{
FileOutputStream fos = null;
BufferedOutputStream bos = null;
boolean isNew = false;
File f = new File (infos.getPath (), destFileName);
if ( !overwrite && f.exists () )
{
throw new TxfsException ("Error writing in file (file already exist):" +
destFileName);
}

try
{
// if ( !f.exists () )
// {
//    isNew = true;
// }
DirectoryUtil.mkDirs (f.getParentFile ());
try
{
Copier.copy (data, f);
}
finally
{
// if ( isNew && isInTransaction () )
// {
//     addCreatedFile (destFileName);
// }
IOUtils.closeInputStream (data);
}
}
catch ( IOException e )
{
throw new TxfsException (“Error writing in file:” + destFileName, e);
}
}

The transaction concern is removed from the base method. The overridden method looks then like:

public void writeFile (String destFileName, InputStream data, boolean overwrite)
throws TxfsException
{
try
{
super.writeFile( destFileName, data, overwrite );
}
finally
{
addCreatedFile (destFileName);
}
}

But we have then two problems: (1) we don’t know if the file is new, and it’s always added to the list of created file. (2) if the base method throw an exception because the file already exists and the flag is false, we still add it to the list of created file when we shouldn’t.

We could change the base method to have a return code (e.g. FileCreated and NoFileCreated). But return code are not generally a good solution and are quite ugly.

No, what we must do, is remove some responsibility to the method. We then split it into two methods createFile and writeFile. One expects the file to not exsits, the other the file to exists.

void writeFile (String dst, InputStream data ) throws TxfsException
void createFile (String dst, InputStream data ) throws TxfsException

(The method writeFile which takes the additional overwrite flag can be composed out of the two previous one )

So our simplified writeFile method looks like:

public void createFile(String destFileName, InputStream data)
throws TxfsException
{
try
{
super.createFile(destFileName, data);
}
finally
{
// we add the file in any case
addCreatedFile(destFileName);
}
}

Alas, there is still the problem that if super.writeFile fails because the file already exists, we add it to the list.

And here we start to realize that our exception handling scheme was a bit weak. The general TxfsException is rather useless: we need to refine the exception handling to convey sufficient information to support meaningful treatment.

void writeFile (String dst, InputStream data )
throws TxfsException, TxfsFileDoesNotExsistException

void createFile (String dst, InputStream data )
throws TxfsException, TxfsFileExistsAlreadyException

Here is then the final code:

public void createFile(String destFileName, InputStream data)
throws TxfsFileAlreadyExistsException, TxfsException
{
// I can not use a finally block because
// if failure happens because file already existed       

// I must not add it the list
try {
super.createFile(destFileName, data);
// we add the file if creation succeeds
addCreatedFile(destFileName);
}
catch (TxfsFileAlreadyExistsException e)
{
// we don't add the file if it already exists
throw e;
}
catch (Throwable e)
{
// we add the file in any other case
addCreatedFile(destFileName);
}
}
}

Conclusion

Anti-If refactoring is not so easy. There are various kind of ifs, some of which are ok, and some of which are bad. Removing ifs can imply changing the design significantly, including the inheritance hierarchy and the exception hierarchy.

A tool to analyze and propose refactoring suggestion would be cool, but for the time being, NoIF refactoring is probably in the hands of developers which can ensure the semantics equivalence of the code before and after the refactoring.

Links

http://www.technology-ebay.de/the-teams/mobile-de/blog/an-exercise-in-unconditional-programming.html

Object-Oriented Equality

I discussed in my previous post the fact that we need a better support of immutability in object-oriented language. The problem is that it’s no so easy to add because the object paradigm is rooted in the concept of an object being an entity whose state is normally mutable. I discuss here about one of its cousin: object equality.

Some objects are more equal than others

First a quick recap. Object equality is an equivalence relation traditionally implemented through equals and hashCode. Here is the corresponding Javadoc:

The equals method implements an equivalence relation on non-null object references:
•    It is reflexive: for any non-null reference value x, x.equals(x) should return true.
•    It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
•    It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
•    It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
•    For any non-null reference value x, x.equals(null) should return false.

The definition implicitly allows mutable state to be used for equality. This is however dangerous, and equals should be defined only based on immutable state when possible.

This is notably a requirement for Collections, which require a stronger equality contract. If the object equality changes while it is in the collection, the behavior of the collection may not be consistent. See pitfall #3 in How to write equality in Java.

The simplest solution to this problem is to make object equality immutable, that is, the fields participating in the equivalence relation are final – the equality can never change post-construction.

This is however not always possible, especially if the object requires an initialization in a post-construction phase. If the object equality depends on related objects, a strictly immutable object might still see its hashCode change if one of the depending object is mutated. This leads to the concept of ownership, where object that are owned should also be immutable. Pure ownership is however an object-level/run-time concern which is not easy to ensure (class-based ownership is possible but more limited).

As presented in “Understanding the Impact of Collection Contracts on Design”, we should then consider (i) the construction of object (ii) the mutability of the object state and (iii) the “deepness” of the equivalence relationship to reason about object equality.

We have then actually three types of fields:

–    final. Never change post-construction and referenced object is immutable
–    eventually final. Can change post-construction but will be frozen in a point in time
–    mutable. Can be mutated anytime.

Object could be frozen post-creation as is proposed in “Flexible immutability with frozen objects”. The equivalence relation could use only final and eventually final fields. Owned object could be modified only through the parent object, as to ensure that the equality contract is never broken.

There is no one notion of equivalence

The problem remains that there is not “correct” implementation of object equality. It mostly depends on the usage. You may want to compare list based on reference identity, but also based on their content sometimes. Should we then have a notion of  “deepness” right into the equals operator, in a way similar to the variants of shallow and deep cloning?

aList.deepEquals( anotherList )

Well, that’s actually what already exists with OrderedCollection, where you specify the Comparator to be used. Is it the solution? Should we remove the object equality form the object and move it the client of the object?

Set s = new HashSet( new ReferenceEqualityContract() );

Or should we be able to instantiate an object and specify which equality contract (or more generally a comparison contract) it should fulfill?

In this case, an object is limited to have only one equivalence relation at a time.  (See this post for a Haskell sample of how to do it)

One could specify the behavior of the object at creation time and see that as an object-level behavioral variation. The types of two objects with different equivalence relation should however be different to ensure the relation is always symmetric (objects with different equivalence relation are always different). This means that the each specific variation would correspond to a type, which inherits from a common type.

– Is object equality as subset of the problem of object comparison? Or are they fundamentally different? State used for comparison shouldn’t change while object is in collection, but it’s no necessary part of the “primary key”…

– Should an object have only one equivalence or comparison definition at a time? Or could an object have several ones at the same time? Or one equivalence relation but several comparison definitions at a time? (We could easily imagine two lists with the same objects, but one list storing the object in ascending order and the other one in descending order)

The fact is that in a pure object world we should uniquely compare object by reference. There should be no two objects with the same “primary key”. In “Declarative object identity using relation type“, the authors introduced the notion of scope as to create and obtain the same reference for a given key. The risk then is to alter objects in an unexpected way and break other object invariants (dangling alias, etc.). It is simply impractical. The world is constantly evolving, and we are forces sometimes to have two objects for the same entity: one for the old and one for the new entity (See this other post on persistent data structure). Such objects somehow relates together and are not completely “different”, hence the notion of object equality.

But this little digression still doesn’t solve the issue that whatever the equality contract that the object must fulfill, we cannot ensure that it will be the case.

And it gets worse with inheritance

“There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.”  — Effective Java

Indeed if we have a Point class and a ColorPoint class, the equivalence relation may be broken if ColorPoint doesn’t redefine equals and hashCode. See pitfall #4 in How to write equality in Java. To enforce the equality contract, two objects may be equal only if they have the same type.

The problem is that it is too restrictive. A context that uses color points, such as a graphic editor, may want to share the points with a context that doesn’t have a notion of color, such as 2D algorithm.  The graphic editor would test the equivalence of two figures according to the color as well. And the 2D algorithm would test the equivalence of two points according to their position solely, in a way to prevent division by zero.

Also, once an object has overridden the default equals and hashCode which implements reference equality, it is also impossible for an object to fall back to this mode. As a consequence we may end up in a situation with subclasses whose equivalence relation can’t be expressed. In such case, it should be forbidden to compare objects.

Should we then re-think object equality with ternary logic, so as to be able to return true, false and N/A?

EDIT

Here I will gather other links related to the subjects

Object-Oriented Immutability

There seem to be a growing awareness about the benefit of immutability. Particular cases where immutable objects are much welcome are for instance in the case of value objects or for defensive copying. The problem is that immutable object don’t really go well with the object paradigm.

Let’s consider an application, which need to deal with the concept of revision. A revision is composed of a major and a minor revision number, e.g. “1.4”. This is typical candidate for a value object. The only sensible methods that deal with revision are inc which increases it, and promote which jumps to the next major version (“1.3” -> “2.0”). I will consider only inc in this post.

Here would be the (trivial) code in Java:

public class Revision
{
   private final int major;
   private final int minor;
   public Revision( int major, int minor ) { ... }

   public Revision inc() {
       return new Revision( major, minor+1 );
   }
}

This is however only a ad-hoc solution. This does not work well with polymorphism for instance: if a subclass of Revision is created, say NameRevision, calling inc on an instance would result in a Revision not NameRevision. This problem can be solved in other ways, and would for instance not appear in Smalltalk where instanciation would be done with super.basicNew on the class-side. But this just shows how mainstream OO language are not designed for immutability.

How would that be in a functional language such as Clojure? A map would probably be used to hold the major and minor number and the inc function would be as simple as setting a new value to the minor key in the map. Because any data modification results in a new structure, we don’t have the explicity make a difference between creating and modifying a structure. See my discussion on persitente data structure

(defn create-revision [] (hash-map :minor 1 :major 0))
(defn inc-revision [r] (assoc r :minor ( + (get r :minor) 1 ))

Note that the inc function is “polymorphic” in the sense that if I pass a map with :minor, :major and :name, I obtain a copy which has the three keys values.

The fundamental mismatch between OO and functional programming is that assignment has different semantics. In OO, assignment change the data in-place. In functional programming, assignments result in a copy.

And what if assignment would also result in a copy in OO? That sounds a bit contradictory to the OO spirit, but well, let’s give it a try…

If assignments result in a copy, this means that something like this.field = “newvalue” actually returns a new instance of the object, right? We could then imagine a class like this:

public class immutable Revision
{
   int major;
   int minor;

   public int getMajor {
      return this.major
   }

   public Revision setMajor( int value ) {
      // assignment result in a copy!
      return ( this.major = value );
   }

   // same for minor

   public Revision inc()  {
      return this.setMajor( this.getMinor() + 1 );
   }
}

We can somehow see that a generalized form of the builder pattern, and it fit well with fluent interface:

Revision r = new Revision( 1, 0 );
r.inc(); // return 1,1
r.inc(); // return 1,1

But

Revision r =
new Revision( 1, 0 )
.inc()
.inc(); // return 1,2

Note the keyword immutable that is necessary to specify the semantics of assignment for the class (there will indeed always be some mutable classes).

We now have two semantics for the assignment, one producing a copy and one changing the data in-place.

That sounds logical, that all classes in class hierarchy share the same mutability.  Assigning a field from a subclass or a superclass should indeed produce the same result. But at the same time, this also means that we have two Object types? One mutable and one immutable? Hmm…

Another sound restriction for such a language would be to have assignment restricted to this. Something like obj.field = "newvalue" would be forbidden. This is anyway a good practice.

Also assuming that the language supports closures, do they fit in the picture? I would say yes, but I’m not sure. Given the signature of the closure is determined (it’s a function after all), and it cannot access any field of the object directly, so the semantics should be consistent.  This would however require a bit more formalization to ensure there is no hole…

Like in some other post, I don’t know if the idea presented here hold for real. But I do like immutable object even in OO, and think that OO language lacks supports for such design. Current solution for immutable object is ad-hoc, but we could imagine blending functional programming and OO a bit more, resulting in some object having a hybrid behavior. Such object would essentially be immutable data structure embedding at the same time the function that corresponds to them, which still fit the equation Object = Data + Logic.

More About Immutability

  • To mutate or not to mutate?
  • Frozen object in ruby
  • Java.next : immutability “But the benefit of using immutable data increases when you compose objects together. If you wanted to compose the synchronized version with another object, you would have have to dig back into the internals of both objects, study their locks, and pick a new lock strategy to cover the two objects together. Composing operations with immutable objects is much easier.”

Taming Size and Complexity

The only real problem with modern software is size and complexity. If we had a bigger brain able to apprehend and reason about software as a whole without omitting details, we wouldn’t have that many issues. Unfortunately, our mental abilities are limited, and as a consequence, we need to have ways to build software whose complexity is beyond our own analytical power. The same is true for any large scale engineering initiative. Apart from discipline, which is a prerequisite to manage size and complexity, traditional ways to address size & complexity are: abstraction, automation and intuition.

Abstraction

The traditional way to address complexity is to raise the abstraction level. Get rid of details and stay focused on the essential – complexity goes away. You can then reason on different parts at various abstraction levels independently of each other. This is the ground-breaking argument about any modeling effort or methodology. The problem is that the whole is not equal to the sum of its parts. Unforeseen interactions will emerge resulting in a myriad of potential problems.  An other major problem is the traceability of the different parts.

Automation

The traditional way to address size is through automation. A lot of task that we perform are not tedious due to their very nature, but due to the effort they demand. Our concentration is also limited which implies we will make mistakes. Automation leads then not only to higher productivity but higher quality.  There are too many examples of automated task, but code formatting and refactoring fall for instance into this category.  Even though automation is extremely effective for specific task, automation is also impacted by the complexity of the software to produce. State explosion is for instance one of the main problems of a technique such as symbolic execution.

Intuition

Actually, problem solving implies not only strong analytical skills but also some form of intuition. The same goes with software and program understanding. Software exploration and visualization are powerful techniques to reason about abstract information in an intuitive way. Software is intangible and has by consequence no natural representation – this leaves the door open for new visualization metaphors. Examples of interactive visual development technologies are BPEL workflow, DSM, or polymetric views.

A Simple Categorization of Unit Tests

Unit testing has become incredibly popular during the past years. As a consequence I sometimes feel like we’ve lost the focus on why we write unit test and what we expect as a return on investment.

The success of unit testing probably comes from the simplicity of the appraoch. However, we’ve learned since then that unit testing is not so easy neither. A high percentage of code coverage does not necessary mean quality software, doeasn’t mean that border cases have been covered, and can even impede software maintenance if the test suite is poorly organized.

Don’t get me wrong, I do value the benefit of unit testing.  But unit testing for the sake of unit testing has no value to me. If you think an other form of automated testing will be a most rewarding strategy, then do it. If you go for unit tests, the important questions to ask are:

  • Did the unit tests revealed bug in the production code?
  • Did you organise the unit test in a meaningful way?
  • Did you spend time to identify and test border cases?

A clear “yes” to these three questions indicates an intelligent and probably rewarding testing effort. A pertinent test suite should reveal the bugs that you introduce in the system — and don’t pretend you don’t introduce any. A well-organized test suite should give you a high-level view of the features in the module. A well-crafted test suite should cover the nasty use cases of the system before they hurt you when the system is in production.

There is an abundant literature about unit testing, but nothing that I read seemed to cover the reality of the unit tests that I write. I therefore analysed the nature of my own unit tests and came with a personal categorization which differs from existing one. Here is the 4 categories I’ve identified as well as a short description of the pro/cons of each kind.

Basic unit tests

A basic unit test is a test suite which covers one unique class and test each method in a more-or-less individual fashion. This is the core idea of unit test where each tiny functionality is tested in full isolation, possible with the help of mock objects to break the dependencies. As a result, the test should be repeatable and also independent (of the environment and of other modules).

Problem with real unit tests are:

Unit test with indirect assertions

For basic unit tests, the subject under test (SUT) is the very one for which we assert the behavior. We perform an action on the SUT and ensures it behaves correctly. This is however sometimes not possible as soon as the system become a bit more complicated. As a consequence, the actions are performed on the SUT, but we rely on a level of indirection for the assertions; we then assert the behavior of another object than the SUT. This is for instance the case when we mock the database and want to ensure that  the rows are correctly altered.

  • Coupling with the implementation is still high
  • Fake objects migth be used instead of Mocks — they contain state and aren’t purely hard-code anymore
  • Behaviour of the SUT is harder to understand due to the level of indirection

Inflection point unit test

Inflection points — a term coined by Michael Feather if I’m right — are somehow the entry points to a given software module. For utility libraries or services, the inflection points correspond to the public API. Testing these specific points is the most rewarding strategy to me, and other people think the same.

  • Inflection points are less subject to changes, and are closer to a black-box form of testing
  • Tests become the first client of the interface and give you a change to check if the API is practical
  • After having covered all the common use cases of the inflection point, the test coverage of the underlying classes should be close to 100%. If not, this indicates a potential design weakness or useless code.

Dynamic unit tests

I qualify tests as “dynamic” when their execution change from one run to the other. Primary goal of such test is to simulate the dynamicity and variability of the productive system. Such test are however quite far away from the concept of basic unit tests and could be considered to some extend as integration tests; they are however still repeatable and independent of other modules. Execution in the real system may indeed be aftected by either

  • Threading issues
  • Contextual data, e.g. cache
  • Nature of the input

Most of these tests rely on randomization, for instance to generate input data or disrupt the scheduling of threads. Though it’s more complicated,  randomization and fuzzing have been proved as effective techniques to detect issues which would never arise with fixed execution condition. Think for instance about phase of the moon bugs and date problems.

Abstraction-Level vs. Meta-Level

As per wikipedia, a model is a pattern, plan, representation (especially in miniature), or description designed to show the main object or workings of an object, system, or concept. In the case of software engineering, models are used to represent specific aspects of the software system, such as static structures, communication paths, algorithms, etc.

Models can be created along two axis:

Abstraction level

An abstraction level is a way of hiding the implementation of a particular set of functionality. Both representations are however two descriptions of the same reality. The representation can vary in their level of detail or in their expressiveness. Example

C code – assembler code.
Both representation describe a sequence of operation in an imperative way, but C is much more expressive than assembler.

Java code – sequence diagram.
Both representations describe the run-time behavior of the system. The sequence diagram does however frequently omit details of the call flow.

Meta-level

A meta-level (and meta-model) is way to highlight the properties – the nature – of a particular set of elements. A meta-model describes then the constraints or the semantics to any set of such elements.

Object – class.
In OO technologies, object and classes belong to two different meta-level. The class is a representation of the structure (constraints) that any of the instance will fulfill.

XML – DTD.
An XML file can be validated against a DTD, which is not a representation of the particular file, but of the structure (constraints) that the file must fulfill.

Tower of models

Models themselves can be refined towards either higher abstraction level or meta-levels.

Component diagram – class diagram – Java code.
All three artifacts are representations of the same reality at various level of refinement and reasoning.

XML – Schema – Meta schema.
In this case, each model is meta description of the underlying one. XML Schema themselves must conform to the schema http://www.w3.org/2001/XMLSchema.

As with the XML example, the tower of meta-model generally ends up in model specified in itself. That is, a recursion is introduced to stop the tower of meta-model from growing infinitely.
The term “modeling” refers commonly to “raising the abstraction level”; the term “meta-modelling” is used otherwise. Different tools/technologies don’t necessary represent different meta-levels. Inversly, one technology may be used to address different meta-level.
With respect to OO technologies, several UML diagrams and notations work at different meta-level. Object diagram and sequence diagrams describe the interaction between specific instances of objects. Class diagram specifies constraints (such as cardinality) or semantical relationship between objects (such as aggregation). Stereotypes can then be used to specifies constraints (such as “singleton”) or semantics (such as “entity”) to the class itself. For instance, an “entity” stereotype could be defined to flag all classes with an attribute “primary key” of type integer.
Level \ Technology

Language

UML
L0 Object  Object diagram,  Sequence diagram
L1 Class Class diagram
L2 Stereotype

Examples where meta-models are useful

Having three meta-level provides one extra level of indirection that can be very practical in some cases:

Level

XML transformation

OO refactoring

Annotation processing

L0 – information Data Object Class annotation (e.g. @Entity)
L1 – model Schema Class Annotation processor (e.g. JPA)
L2 – meta-model Meta schema Metaclass Classloader

Here are three common examples which actually leverage the strength of meta-model.

XML transformation

The success of XML is related to the existence of a meta-model which makes it a semi-structured format. Structured data are interpretable only by their host application which is the only one to know the model. Unstructured data (plain text, image, etc.) have not model and are not interpretable at all. Semi-structured data have a model which is itself understandable by any other application which knows the meta-model. Due to this extra-level, reusable parser, validator, encryptor or even arbitrary transformer (think XSLT) could be created. The schema could also be extended without necessary breaking backward compatibility (think XPath).

OO refactoring

Refactoring rules is a typical example of the usage of the OO meta-model: refactoring rules are proved to preserve the validity of the model against the meta-model, and are reusable for any model.

Annotation processing

Annotation processing is yet another example where a meta-model is used under the hood. Just like the class-loader knows how to load a class because of the meta-model of any class, it is also able to load the corresponding annotations. This reliefs the application from storing application-specific semantics in separate files (think deployment descriptor). The application (annotation processor) can then query the annotation at run-time to refine the behavior of the system. The annotations enrich classes with additional constraints and behaviour in a resuable way: the same annotation processor can be used for arbitrary class sets.

Pharo By Example

This is an introduction book to Smalltalk and the Pharo development environment. The book is split in two parts: the first one covers the Smalltalk language, the Pharo IDE, the Morphic package for GUI application and the Seaside framework for web application. The second part is relatively thin and covers more advanced topics such as reflection.

The book is easy to read, with many examples to follow, and the reader will quickly acquire a good comprehension of Smalltalk. The object-oriented concepts specific to Smalltalk are however rigorously discussed. A good example would be the paragraph about class variables in case of a class inheritance: it’s easy to follow in the book but still an OO concept that could be otherwise misunderstood. These paragraphs are the most important one for people (like me) coming from other OO languages.

The GUI and web frameworks are briefly covered but enough to get a idea of how to they work. Both are component-based, and a concrete component is built from A to Z in both cases to get insight about the frameworks.

The second part about reflection is where the beauty of Smalltalk shines. First the concept behing meta-object and reflection are presented and then a few useful patterns which use reflection are described, for instance how to build a dynamic proxy in Smalltalk. This is clearly the most interesting part, where the dynamic nature of Smalltalk can be exploited in a constructive way.

Software Evolution

Software evolution studies software engineering under the perspective of software maintenance and change management. The term was coined in the 70 after the seminal work of Lehman and his laws of software evolution.

As such, software evolution is a topic as broad as software engineering itself – any aspect of software engineering (being a process, methodology, tool, or technique) can indeed be studied under the perspective of software evolution.

The book is a comprehensive overview of the current body of knowledge in software evolution. It contains 10 chapters addressing each one a specific field of software engineering under the perspective of software evolution. The articles present recent works and open challenges; the book can then be seen as comprehensive survey as well as a research agenda. The book bibliography is impressive and more than 500 articles are referenced throughout the 10 chapters.

Apart from this selection of articles, the book has an excellent introduction the field of software evolution, and a set of useful appendices pointing to additional resources in this field. The 10 chapters are organized into 3 parts:

Part I: Program understanding and analysis

Identifying and Removing Software Clones
Analysing Software Repositories to Understand Software Evolution
Predicting Bugs from History

Part II: Re-engineering

Object-Oriented Reengineering
Migration of Legacy Information Systems
Architectural Transformations: From Legacy to Three-Tier and Services

Part III: Novel trends in software evolution

On the Interplay Between Software Testing and Evolution and its Effect on Program Comprehension
Evolution Issues in Aspect-Oriented Programming
Software Architecture Evolution
Empirical Studies of Open Source Evolution

Chapters I enjoyed the most were “Object-oriented reengineering”, “On the interplay between…” and “Evolution Issues in AOP”.

I wish there was a chapter about evolutionary issues in model-driven engineering, which is an important area of research. I therefore would recommend “Model-Driven Software Evolution: A Research Agenda” as a complement to this book.

The book is accessible to non-expert and I learned a lot while reading it. The book is definitively worth looking at for anyone interested to understand what “software evolution” is all about.

Quotes

… mostly on software engineering, but not only.

“How does a project get to be a year late?…One day at a time.” — The Mytical Man Month

“Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test. […] Each of us should […] explore and embrace techniques to reduce complexity.”  — Ray Ozzie, CTO, Microsoft

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.” — Saint-Exupéry

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” — D. Knuth

“Beware of bugs in the above code; I have only proved it correct, not tried it” — D. Knuth

“Research is less about discovering the fantastic, as it is about revealing the obvious. The obvious is always there. It needs no discovery. It only needs us to take a look from a different perspective to see it.” — On T. Girba’s blog

“It is impossible for any number which is a power greater than the second to be written as a sum of two like powers. I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain.” — P. Fermat

“These are my principles. If you don’t like them, I have others.” — Groucho Marx

“if all you have is a hammer, everything looks like a nail…”  — B. Baruch

“Discipline is the bridge between goals and accomplishment.” — Jim Rohn.

“If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”  —  Woodrow Wilson

“The devil is in the details” — German proverb quote

“The first idea is always wrong” — Popular wisdom

“What can go wrong will go wrong” — Murphy’s law

“The nice thing about standards is that there are so many of them to choose from.” — Andrew S. Tanenbaum

“The manager asks how and when; the leader asks what and why.” — Warren Bennis

“Experience is what you get when you don’t get what you want.” — Dan Stanford

“A mission is something that you do in a bounded amount of time with bounded resources. A purpose is something that you never achieve.” — William Cook

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” — Martin Fowler

“The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.” — E.W. Dijkstra, The Humble Programmer

There are wavelengths that people cannot see, there are sounds that people cannot hear, and maybe computers have thoughts that people cannot think” — Richard Hamming

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” — Brian Kernighan

“If others would think as hard as I did, then they would get similar results.” — Newton

“Rules of thumb are sometimes what their acronym infers…ROT” — unknown

“Any intelligent fool can make things bigger and more complex. It takes a touch of genius – and a lot of courage – to move in the opposite direction.” — Albert Einstein

“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to mean — neither more nor less.” — Lewis Caroll, Through the Looking Glass

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” — Tony Hoare

“He who hasn’t hacked assembly language as a youth has no heart. He who does as an adult has no brain.” — John Moore

“If you can’t explain it simply, you don’t understand it well enough” — A. Einstein

“There are two groups of languages: those which everyone hates and those which nobody uses.” — Unknown

“Any problem in computer science can be solved with another layer of indirection” — David Wheeler

“Idiomatic usage often matters more than what is strictly possible.” — A comment from Stuart Halloway

“You learn lisp for the same reason you learn latin. It won’t get you a job but it will improve your mind” — Paul Graham

The Internet is the world’s largest library. It’s just that all the books are on the floor.” —John Allen Paulos

“There are only two hard things in Computer Science: cache invalidation and naming things” — Phil Karlton

Some abstraction is great, but eventually you need to write code that does something useful.” — Bill Karwin

“Tell me, I forget. Show me, I remember. Involve me, I understand.” — Chinese proverb.

Don’t reinvent the wheel unless you want to learn about wheels.” — Popular wisdom

“I’m sorry this letter is so long, I didn’t have time to write a shorter one”  — Mark Twain

“To be trusted is a greater compliement than to be loved” — George Mcdonald

“Most people work just hard enough not to get fired and get paid just enough money not to quit” — George Carlin

“Find your obsession, make it your profession, and you’ll never work a day in your life” — Unknown

Plans are nothing; planning is everything.” – Dwight D. Eisenhower

“If you set your goals ridiculously high and it’s a failure, you will fail above everyone else’s success.” — James Cameron

“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.” — Robert A. Heinlein

“Tools can enable change in behavior and eventually change culture” — Patrick Debois

“First you learn the value of abstraction, then you learn the cost of abstraction, then you’re ready to engineer.”@KentBeck

“Simplicity does not precede complexity, but follows it.” — Alan Perlis

“The most important thing in communication is hearing what isn’t said.” – Peter F. Drucker

“Computer science is now about systems. It hasn’t been about algorithms since the 1960’s.” – Alan Kay

[Some people] just can’t wrap their head around that someone might build something because they like building things. -– Mark Zuckerberg in an inteview

“So much complexity in software comes from trying to make one thing do two things.” – Ryan Singer