Implementation of Semaphores

For the need of the experimental Pinocchio research VM, we needed to add support for threading and concurrency. We implemented green threads, a la Squeak and there is then no “real” multi-core concurrency going on. The VM relies on AST interpretation, instead of bytecode. With green threads, the interpretation of an AST node can always be considered atomic: no two AST node can be interpreted concurrently. This is unlike Java and its memory model, where individual bytecodes can be interpreted concurrently, possibly with nasty side-effects (e.g. manipulation of long is not atomic). Thread preemption can happen anytime between AST nodes evaluation.

How can we add support for semaphores?

The pharo design

The pharo design can be informally summarize like this: when a semaphore is instantiated, its counter is set to one. Whenever a block needs to be evaluated in an exclusive way, the counter is checked. If the counter > 0, it is decreased and the block is evaluated. If the counter = 0, the thread is suspended an added to the list of threads currently waiting on this semaphore. When the critical block has been evaluated, the list of suspended threads is checked. If there are not suspended threads, the counter is incremented to 1. Otherwise, one of the suspended thread is picked and resumed.

This implementation explains why Semaphore extends LinkedList. I’m not sure it’s the best design decision, because it’s not conceptually a list and the list protocol should not be exposed by a semaphore. It uses inheritance for implementation reuse, but composition would have been just fine here if the semaphore was internally holding a linked list (and maybe use a pluggable ownership type system to check that the list does not leak out of the semaphore…).

Also, semaphores must be created using the forMutualExclusion factory method. This method instantiates and initialize the semaphore to allow exactly one execution at a time (hence the term mutual exclusion), but nothing would prevent you from initializing the semaphore so that up to N blocks can be executed concurrently.

The respective code for wait and signal (which respectively decrement and increment the counter) are:

wait
 excessSignals>0
 ifTrue: [excessSignals := excessSignals-1]
 ifFalse: [self addLastLink: Processor activeProcess suspend]
signal
 self isEmpty
 ifTrue: [excessSignals := excessSignals+1]
 ifFalse: [Processor resume: self removeFirstLink]

They are however implemented as primitives. I suspect this is not for performance reason, but for the sake of concurrency correctness. These operation themselves need to be atomic. Implemented in Smalltalk, the thread could be preempted during one of these, breaking the semaphore’s design.

The test-and-set design

These two methods and the internal counter suggest that an implementation relying on more generic concurrency primitive is possible. Typical concurrency primitives for this are test-and-set or compare-and-swap.
 We’ve added a primitive testAndSet to Boolean, and implemented the Semphore with busy waiting (also sometimes called spin lock):
 
 critical: aBlock
 | v |
 "we spin on the lock until we can enter the semaphore"
 [ lock testAndSet ] whileTrue: [ PThread current yield ].
 "we evaluate the block and make sure we reset the flag when we leave it"
 [ v := aBlock value. ] ensure: [ lock value: false ].
 ^ v.

The design could be improved to no use busy waiting. Instead of yielding, the thread would be suspended and added to a list. In the ensure block, the flag would be reset and one of the thread would be resumed. The resumed thread would however still need to testAndSet the lock to prevent that no other thread has entered the semaphore in the meantime, possibly delaying the thread forever. So if fairness is required, this algorithm is not optimal.

The bakery design

You can also implement critical section without the support of other concurrency primitives. The most famous one is probably Lamport’s bakery algorithm:

What is significant about the bakery algorithm is that it implements mutual exclusion without relying on any lower-level mutual exclusion. Assuming that reads and writes of a memory location are atomic actions, as previous mutual exclusion algorithms had done, is tantamount to assuming mutually exclusive access to the location. So a mutual exclusion algorithm that assumes atomic reads and writes is assuming lower-level mutual exclusion. Such an algorithm cannot really be said to solve the mutual exclusion problem. Before the bakery algorithm, people believed that the mutual exclusion problem was unsolvable–that you could implement mutual exclusion only by using lower-level mutual exclusion.

In our case with green threads, read and write are atomic because they are single AST nodes, but this isn’t necessary the case.

Here ends this little trip into basic concurrency. There is a rich litterature on the topic — which is truely fantastic — and we might explore and implement more sophisicated abstractions later on.

References

The Java Memory Model
Thread Synchronization and Critical Section Problem
A New Solution of Dijkstra’s Concurrent Programming Problem

The pharo design

The test-and-set design

The bakery design

Leave a comment Cancel reply