Organisation

SAFe: What’s a Release Train Engineer?

SAFe introduces a new role in the industry: the release train engineer (RTE). A RTE is, according to the framework:

The Release Train Engineer (RTE) is a servant leader and coach for the Agile Release Train (ART). The RTE’s major responsibilities are to facilitate the ART events and processes and assist the teams in delivering value. RTEs communicate with stakeholders, escalate impediments, help manage risk, and drive relentless improvement.

The role is designed like a scrum master at the ART level. At a minimum, a RTE ensures that the process is followed. But a good RTE helps teams improve their performance – that’s the essence of the job. A RTE doesn’t have any authority over the content in the backlog. The focus on only on improvement at the organisational level. As such, the wording “assist the teams in delivering value” leaves quite some lattitude in how impactful an RTE can be.

What do you expect from a RTE? I am wondering how this role will establish itself in the industry. Here are my personal expectations.

Level I – The Organizer. The RTE ensures that the process is followed. He/She ensures that information flows between the teams using the elements of the framework. The RTE helps resolve problems related to the work environement as they appear. Example of such problem could be: tools to communicate, organisation of the program backlog, running the ART events. He/She makes sure people can work.

Level II – The Moderator. The RTE is able to create plattforms or use existing plattforms to encourge discussions in the ART / Solution. With some moderation talent, he/she can help instill change, support improvements, or create alignment. The RTE helps resolve problems about team performance as they appear. Example of such problem could be: interpersonal issues, improving the collaboration with a specific provider, managing morale in challenging time, ensuring transparency, suggesting a feature stop to address the existing bugs first.

Level III – The Influencer. The RTE identifies systemic performance issues in the organisation and work towards resolving them by instilling change at the organisation, technical, or product management levels. Example of such issues could be: addressing systemic quality issues due to the work culture, working with the system architects/teams/system team to make the continous delivery pipeline faster, encouraging decentral decision-making (while managing risks), improving feedback loops.

The higher the level, the more interdisciplinary the RTE will have to work. While little knowledge of product management or architecture is needed to be proficient at level I, problems at level II and III will require a good understanding of how engineering works and how product management, technology and processes influence each others. On the technology front, the RTE is also a key stakeholder to support mindset like DevOps, which means he must also have some good understanding of how technology supports delivery and operations.

The RTE role ressembles that of the more established delivery manager. Both focus on similar sets of issues.

The big difference between both roles lies I think in the mindset. A RTE is a coach and as such has little formal authority in itself. He leads by helping other take the right call. A delivery manager will typically have more formal authority. For instance, a RTE has no authority over the priorisation of backlog in itself. The PM and PO have formally this responsability. The RTE coaches the PM/PO in priorizing work.

The higher the level, the more the RTE works at the level of the engineering culture. It’s easy to define values and visions that nobody follows. Culture is defined by how people effectively behaves. It’s hard to be a good RTE. Just like it’s hard to be a good scrum master. Changing how people work isn’t easy.

Thinking

Systems Thinking: SAFe vs LeSS

I was pleasently surprised to see Systems Thinking as principle #2 in SAFe. I recently came in contact with systems thinking when reading Limits to Growth, which explores the feedack loops in the global economy. Donella Meadows is also the author of Thinking in Systems, which addresses more generally how to understand complex systems dynamics with such feedback loops (the book is in my list of to-read).

This is the definiton of systems thinking according to SAFe:

Systems thinking takes a holistic approach to solution development, incorporating all aspects of a system and its environment into the design, development, deployment, and maintenance of the system itself.

It’s quite general. But arguably, there isn’t one definiton of systems thinking. If you read Tools for Systems Thinker, the study of feedback loops is only one aspect of systems thinking. The more general theme is to understand the “interconntedness” of the elements in the system.

A system is a set of releated components that work together in a particular environment to perform watherver funtions are required to achieve the system’s objective. – Donella Meadows

Principle #2 in SAFe is about realizing that the solution, but also the organisation, are complex systems that benefit from systems thinking.

Interestingly, Large Scale Scrum (LeSS) also has systems thinking as principle. It’s more concrete than the equivalent principle in SAFe. The emphasis is on seeing system dynamics, espectially with causal loop diagrams. The article is a very good introduction to such diagram. Here’s an exmaple of a very simple causal loop diagram:

systems thinking-7.png

I like the emphasis on actively visualizing system dynamic:

The practical aspect of this tip (NB: visualizing) is more important than may first be appreciated. It is vague and low-impact to suggest “be a systems thinker.” But if you and four colleagues get into the habit of standing together at a large whiteboard, sketching causal loop diagrams together, then there is a concrete and potentially high-impact practice that connects “be a systems thinker” with “do systems thinking.”

The idea is that it’s only when you start visualizing the systems dynamics that you also start understanding the mental models that people have, and only then can you start discussing about improvements.

I like the more concrete way to address system thinking in LeSS as in SAFe. Recently, I discussed with our RTE about some cultural issue related to knowhow sharing. Using a causal loop diagram would have been a very good vehicule to brainstom about the problem. I think I will borrow the tip from LeSS and start sketching such diagrams during conversations.

Organisation

SAFe: The Good Parts

The Scaled Agile Framework (SAFe) is a complex framework. I mean, just look at this picture:

Long is gone the simplicity of Scrum. Its glossary contains 102 items (I counted them!), ranging from obvious concepts like a “story” to esoteric notions like “set-based design” with “customer centricity“ in between. The framework is meant to impose some structure, but at the same time, it has so many elements that with some creativity, you can probably retrofit any organisation in it without changing anything (for instance by abusing the concept of shared services). If agile was meant to be about simplicity, then SAFe is far from it.

SAFe comes in various “configurations”. The picture above is “portfolio” SAFe. And mind you, there is a “full” SAFe configuration which is even more complicated. But the core of SAFe – the “essential” configuration – has actually good parts:

  • An agile release train (ART) is a collection of teams. They synchronize through the program backlog and the PI Planning (PI stands for “program increment“)
  • ARTs should align with value streams. You organise you company in ARTs based on how you generate value to your customers so that each ART focuses on one part of the value stream. The definition of value streams is of course complicated in the glossary, with development and operational value streams distinct from each others, but the idea is actually good. You align IT and Business this way.
  • At the ART level, the leadership is split across three roles: Product Management, System Architecture, Release Train Engineer. I think that this split is a nice point in SAFe. It creates some balance in responsability and makes it clear the to be efficient, you need to address product features, architecture, and work culture together since they all impact each others.
  • SAFe also introduces a special terminology for things that aren’t features on their own: enablers. Chances are, you had this kind of work item already, just with a different name. But naming matters, and SAFe make a good use of the concept of “enabling” at various level. I like it.
  • Community of Practice as the naming for working groups around specific issues.
  • The System Team helps with toolchains, infrastructure, build pipelines, integration testing.

Most companies develop their own organisation when growing, which will have some similar elements. Maybe you have different roles (e.g. “engineering managers”), or different ways to synchronize, or some other way to manage architectural work. Some things are surely different, but some things are probably similar, but named differently, or implemented differently. If you want to move to SAFe, how much you will need to adapt will depend. But for most enterprise, the change isn’t radical.

In this sense, SAFe is as collection of patterns. What SAFe gives you is a standard frame of reference to discuss about these aspects. SAFe establishes a common vocabulary to talk about the organisation and how to improve it. Where this analogy with patterns fails, though, is that you usually can decide to implement some pattern individually. SAFe come as a framework of patterns, where all of them must be implemented.

The „large solution“ configuration adds an additional level of scale with product management, train engineer, and architecture at the solution level. Solution and ARTs have the same cadence and synchronize through the same PI plannings. They have the same program backlog. This makes sense. (Historical note: “Program Level“ was replaced with “Essential” in SAFe 5, but the rest of the “program” terminology survived)

With the “portfolio” configuration, you have an additional level of “lean portfolio management” (LPM) whose goal is to « align strategy, funding and execution ». This adds epic owners, enterprise architects, lean business cases, KPI and the like the framework. According to the framework, only with this configuration can you achieve business agility. Something I like with SAFe is that idea to fund value streams rather than projects.

I understand that this level may match well with existing organizations, with funding systems and steering boards. But the portfolio level still has a bit the feeling of ARTs and Solution Trains as “factories”, divorced from real business accountability. If the goal is to bring IT and business closer to each other, why not push these elements to the ARTs and Solutions? Make them accountable for the value their products generate. In a way, I wished that this level wouldn’t exist, or existed in another form – for instance not beeing an addtional level but rather a vertical that complement the existing levels. I understand that some initiatives will impact several steps in the value stream, and thus possibly several ARTs or Solutions. But I hope it’s the exception, not the norm. On the other hand, maybe that’s also precisely the point of the portofolio level beeing above the Solution / ARTs. If your business (and thus value streams) isn’t yet clearly establisehd, you need another level to be able to shape the value streams based on feedback from the market. I think that the portfolio level will be used very differently from enterprise to enterprise.

In its core values, SAFe recognizes its influences: Agile development, Lean product development, systems thinking, and DevOps. The framework actively tries to combine these influences into a consistent whole. The problem is that it feels sometimes a bit too much: The SAFe core values page lists 4 values. The lean-agile mindset page lists 4 pillars. The SAFe lean-agile principles page lists 10 principles. The lean-agile leadership lists 3 dimensions. Business agility lists 7 competencies that are required (on the left in the picture, but “competency” isn’t in the glossary). I like conceptual frameworks, really. But it’s hard for me to not get lost here.

I guess that companies moving to SAFe will still need to tailor it to their needs anyway. Where I’m working, they added „subject-matter expect“, for instance. That’s fully in the idea of agility- tailor processes when you need it. But with this idea in mind, SAFe could have been kept smaller rather rather than trying to be all encompassing.

Organisation

SAFe: Evolution Over the Years

It’s very interesting to see how SAFe evolved over the years. The version 2, circa 2013, looked like this:

Some things are worth noting:

  • There is no large solution. Only Team/Program/Portfolio
  • At the program, we find release management.
  • The symmetry between PO/Team/ScrumMaster and PM/Arch/RTE isn’t yet established
  • Spikes and Refactors, a terminology comming from eXtreme Programming
  • Epics are primarily characterized as something that spans releases, to be broken down into features that fit in releases

Interestingly, this setup is very like the structure I know from my work.

This is version 3, circa 2014:

There aren’t that many changes compared to v2. The biggest change seems to be the introduction of value streams at the portfolio level. With it comes the ideas that we fund value streams. We also see some “principles” appear, like the House of Lean, the Lean-Agile Leadership, Built-In Quality at the Team Level.

Here is version 4, circa 2016:

Major changes include:

  • An additional level between program and portfolio: the value stream. The “Solution Train Engineering” from version 5 is a “Value Stream Engineer”. The value stream is very present in this configuration.
  • The symmetry between PO/Team/SM – PM/Arch/RTE – SolMgmt/Arch/VSE is established
  • Release management is subsumed with shared services
  • Community of Practices appears
  • Some additional “principles”: Economic Framework, MBSE, Set-based, Agile Architecture, Core Values, Lean-Agile Mindset, SAFe Principles.

Here’s version 4.5, circa 2018

  • Value Stream Level is replaced with Large Solution Level. With it the Value Stream Engineer becomes a Solution Train Engineer.
  • Supporting artefacts and teams regroupped in a sidebar.

Here is the current version 5.1:

We have several major changes (here’s an detailed analysis of them)

  • The introduction of “Business Agility” as the overarching goal, to be achieved with the profolio level.
  • Introduction of the 7 core competencies (Organizational Agility, etc.)
  • The levels Program and Team merge into “Essential”.
  • Some more “principles”: customer-centricity, design thinking.

By studying the evolution of the framework I understand some things better now.

  • The core of the framework with agile release train and portfolio levels remained quite stable over the years
  • The large-solution level appeared over time, morphing from the value-stream level. The symmetry between the ART and solution level with the 3 roles PM/Arch/RTE took some time to evolve to how it is now.
  • The term epic became more complicated to understand. It started as “something bigger than a release” and existed only at the porfolio level. In SAFe 5, epics can occur at all levels.
  • Supporting artefacts and teams evolved over time, but these were much minor changes. The biggest change was probably the “subsumption” of release management in the shared services.
  • Principles have generously been added continously to the framework. There are now a lot of them.

Just with anything that evolved, some inconsistencies accumulate. I find it interesting to observe this in domains other than code and architecture. For instance, in SAFe 5 the term “program” is still in use (Program Increment, Program Backlog), but the program level disappeared. This is due to historic reasons. Starting directly with the version 5, you would probably name things differently (e.g. “Solution Increment”). Also just like with code and architecture, the framework suffers from feature creep.

Somehow I’m a bit sad that they decided to go away with the “value stream level”. The idea of value stream is very powerful and putting in at center stage was nice. The version 4 has another spin as version 4.5. from an engineering standpoint. With the “value stream level”, various programs deliver independent products that together realize a value stream. With the terminology “large solution” of version 4.5, you get the impression that you have one “large solution” broken down in several components deliverd by various ARTs, that need to be integrated together. The difference can seem subtile, but I prefer the spin of version 4. The “large solution” terminology will tend towards centralization more than the “value stream” terminology.

As for the principles, there are simply too many of them. I believe that the signal-to-noise-ratio here is too high.

Introducing business agility is an interesting move from SAFe. I expect the discussion “development agility “ vs “business agility “ to be all the rage in the coming years. We know how to do agile development. But we still don’t get the expected outcome at the business level. The link is somehow not that trivial as in theory. Version 5 recognizes this and makes it clear that agile development is a mean to an end, not the end itself. It reminds us why we’re making all this. Here’s there’s a clear signal without noise, and it’s valuable.

Software Architecture

Architectures for Mobile Messaging

A project I’m working on involves changing the messaging technology for the delivery of realtime information to train drivers using iPad. This project made me interested in the various ways to design realtime messaging plattforms for mobile clients.

Unlike realtime messaging systems for web or desktop applications, mobile applications have to deal with the additional concern of unreliable connectivity. This unreliable connectivity is more subtle than I though. Here are for instance a few points to consider

  • no connectivity or poor connectivity (tunnel, etc.)
  • the device my switch from 5G to WLAN
  • connection breaks when app goes in the background
  • Different WLAN HotSpots (Androis, iOS) result in different behavior

You need to design your application to support these use cases correctly.

Here are some aspects of the communication that you need to consider

  • Does the client need to load some state upon connection?
  • Have updates a TTL?
  • Are messages broadcasted to several clients or unique for the clients?
  • Is message loss important or not?
  • Does the server need to know which clients are connected?
  • Do you have firewall between client and server?

Depending on the answers to these questions, you migth decide to establish a point-to-point onnection from the device to the backend. If you want to broadcast information to several clients you need to do this yourself in this case. You will also need to manage the sate in the backend yourself. Tracking the presence of the client is trivial, since there is one connection per client. Several technologies exist for this use case:

  • Server-Side Event
  • HTTP Long Polling
  • gRCP
  • WebSocket

You might otherwise decide to rely on a messaging system with publish-subscribe. The most common protocol for mobile messaging in this case is MQTT, but there are others. With a message broker, the broker takes care to broadcast message and persist the state according to the TTL. Tracking the presence of the client can be achieve with MQTT by sending a message upon connection and using MQTT’s “Last Will Testament” upon connection loss.

There are of course more details to take care when comparing both approaches, especially around state management. For instance, how to make sure that outdated messages are ignored.

We chose the latter option (MQTT) for our project, but I’m sure we could have achieved our goal with another architecture, too.

MORE

Apprently Uber and LinkedIn rely on SSE:

Software Architecture

The Two Faces of Streaming

In an event-driven system, the unit of abstraction is the event. An event is a notification about something happening in the system. Events can be processed and lead to other other events.

In systems relying on streams, the unit of abstraction is the stream. A stream is a sequence of events, finite but also possibly infinite. Rather than processing individual events, you manipulate streams.

If you simply want to react to each event, the difference is insignificant. For more complex situations, using streams makes it easier to express the processing.  Streams are manipulated with various operators, like map, flatMap, join, etc. Implementing windowing is for instance a trivial task with streams – just use the right operator – whereas it would be a complicated task using only events.

merge

One main use case for streams is to implement data pipelines. In this case we speak of stream processing. This is what Apache Flink and Kafka Streams are for. Stream processing is typically distributed on several machines to processe large quantities of data.  The processing must be fault-tolerant and accomodate the failures of individual processors. This means that such technologies have sophisticated approches to state management. In the case of Kafka Stream, part of the heavy lifting is delegated to Kafka itself. Durable logs enable the system to resume after a failure, if needed reprocessing some data a second time.

Streams can also be used within applications to locally process data. This is what RxJava and Akka Streams are for. This tend to be referred as reactive programming and reactive streams. You use reactive programming to process asynchronous data, for instance video frames that need to be buffered.  Rather than using promises or async/await to handle concurrency, you use streams.

There are many similarities between stream processing and reactive programming but also differences. In both cases, we find sources, streams, and sinks for events. In both case, you have issues with flow control. That is, making sure the producers and consumers can work at different paces. Since both use cases differ, the abstractions might differ, though. Streams in reactive programming supports for instance some form or exception handling, similar to regular java exceptions. Exception handling in stream processing is different. With reactive programming, buffering will be in-memory. With stream processing, buffering can be on disk (e.g. using a distributed log).

The stream, as an abstraction, is a relatively young one. It isn’t as well established as, say, relational databases. The terminology varies across products as well as concepts. The difference between stream processing and reactive programming is also not fully understood. For some scenario, the differences are irrelevant. As evidence that the field matures, some efforts to standardize the concepts have already started. The new java.util.Flow package is a standard API for sources (called publisher), streams (called subscription), sinks (called subscriber) in reactive programming. Alone, it doesn’t come with any standardized operator, though. This makes its usefullness at the moment limited to me. Project Reactor‘s aim is similar and it’s an implementation of the reactive streams specification that is embeddable is various framework, e.g. spring. Its integration in spring cloud stream effectively bridges the gap between reactive programming and stream processing.

The stream, as an abstraction, is very simple but very powerful. Precisely because of this, I believes it has the potential to become a core abstraction in computer science. It takes a bit of practice to think in terms of streams, and once you get it, you see possible applications of streams everywhere. So let’s stream everything!

More

Software Architecture

When You Should Rewrite

When the architecture of a system starts to show its limits, it’s tempting to throw everything away and start from scratch. But a rewrite has challenges too. The existing software is a value-generating asset and must be maintained. The new architecture is unproven and comes with risks. Reaching feature parity can take years, and the rewrite turns also into an integration challenge to inteface the old and the new system. If a big bang approach is chosen, planing the switchover without data loss becomes a project on its own. These are just a few of the considerations,  far from an exhaustive list.

Joel Spolsky wrote in 2000 an influential article discouraging rewrites, calling a rewrite the “single worst strategic mistake” you can make. Many developpers know this article and it often cited. Developpers are generally wary of rewrites. I love this description from Tyler Treat:

“Rewrite” is a Siren calling developers to shipwreck. It’s tempting but almost never a good idea

Yet, many software systems are regularly rewritten, as seen by the numerous articles listed below. And many rewrites are successful.

Whether you should rewrite your project or not can only be answered by yourself (or your team). Too many factors impact such a decision to be turned into a decision algorithm. Often, to rewrite or not ist not a binary decision anyway. There are nuances, for instance, which components of the system to rewrite. How much of the old system do you need to replace to call it a rewrite?

Having been working on Smalltalk for some years, I can confirm you that you can go a long way without a rewrite. Indeed, the Smalltak images that we use today are in fact “ancestors” of the very images of the 80s. All changes have been pushed within the environment itself, without a rewrite, even without a restart (because the concept doesn’t exist in Smalltalk).

I expect to hear about a few more software rewrites in my career, because it’s inherently tied to software evolution. A software rewrite might have a negative connotations sometimes, for instance when it’s driven because of massive technical debt. But most software rewrites are driven by increasing requirements. You rewrite your system because you are asked to make it work beyond what it was intially intended for. Actually, it’s the price you pay if your system is too successfull.


MORE

Some stories about rewrites or significant rearchitecturing work that I liked:

Organisation

Aligning Incentives

People working on a product or system might have different interests and motivations. For someone, it might be shipping as quickly as possible. For another, it might be keeping operations stable. For a third, it might be ensuring adherence to some standard or policy. There are many forces at play, potentially in conflict.

These different incentives inevitably lead to friction at work, since the value of some work is assessed differently.

Most IT projects have been traditionally organized so that such frictions arise because activities and responsibilities are silo’ed.

The goal of most agile development practices it actually to reduce such friction by aligning incentives. That is, make teams of people aim at the same goal.

  • If you want to reduce bugs and improve quality, make testing part of development. That’s the “definition of done” in agile methods.
  • If you want to improve stability while shipping new features, involve developpers in operations. That’s DevOps.
  • If you want to make sure developers care about long term maintenance, make teams responsible of components indefinitely. That’s product over project.

The idea is always the same: make the team responsible end to end. The team as a whole share a set of incentives.

Internally, team members might still value some work differently, based on preference or subjective factors. But chances are that the differences are smaller as with silos and consensus is also achieved easier. The context for all team members is the same and discussions aren’t biases due to individual incentives. The big “wall of confusion” between teams silo’ed by activities is replaced by a more balance approaches of simply weighting the pro and cons and commiting to one decision. A lot of the friction goes away.

More

Software Architecture

OOP: past, present, future

Object-Oriented Programming (OOP) has been a mainstream programming paradigm since about 40 years now. That’s quite a bit of time. So it’s worth asking: how did the paradigm evolve over time? I would say, looking back, that there has been 3 eras.

Era 1: Modelling the world with objects (1980-1995)

The idea with the object-oriented paradigm is to model the world in objects. The poster child of the object paradigm from this era is Smalltalk, where everything is an object. Objects send messages to each other and live in a parmanent, persisted state. This approach is great to model stand-alone applications with GUI components. The problem with this approach is that it doesn’t work very well for busines entities. In Smalltalk everything is an object and everyting is persistent. In other language, regular programs are started and stopped. You only want to persist the business entites. Persisting a subset of the objects is possible with object databases. This is a challenging problem though, similar to the serialization of object graphs. Searching and navigating heaps of objects is also no so easy. There’s also no easy way to share the business entities across instances of the application. For these reasons, business applications have frequently relied on relational database to persist their state.

Era 2: Objects and enterprise applications (1995-2005)

Using relational database to persist object graph leads to the so called “object-relational impedance mismatch“. It manifest itself in the difficulty to have a rich domain model that is persistent at the same time. The simplest approach to reduce the mismatch is to have a simpler domain model – just data structure – that can be persisted easily. But this in turns means that you move towards a more procedural style of programing again. This style of programming is well capture in the pattern “anemic domain model” of Martin Fowler.

The Java Enterpise Plattform is a major technology of this era. The Java Enterprise Plattform embraced object-orientation with the concept of enterprise java beans (EJB). Prio to EJB3, entity beans and session beans where objects that were persisted by the application server itself, with a vague ressemblance to an object database. Every operations would be carried as a synchronous operation over the network. The technology proved however to be hard to use because of the associated network costs. Another mismatch. Starting with EJB3, entity beans were turned into regular objects persisted with an object-relational mapper.

Other approaches to object-orientation exists. Domain-driven design promotes rich domain model in accordance with the concept of “modelling the world as objects”. To solve the object-relational mismatch, the domain model is kept separate from the model used for persistence. So called repositories take care of dealing with the mismatch.

The actor pardigm can be seen as a special form of object-orientation where objects communicate asynchronously. Actors are stateful domain entites. This avoids the problem of networking but doesn’t provide out of the box a solution for persistence. Some way to solve it is through event sourcing of object serialisation.

The heavy use of inheritance is also a characteristic of this era. Object-orientation promised reuse, which we thought meant inheritance. This missed the point. Reuse is promoted with good interfaces, which doesn’t stricly needs inheritance. With time, we learned to use interfaces, inheritance, composition and parametric polymorphism (aka generics) in a sane way.

With the previous learnings, the use of object orientation stabilized to a form mixing object-oriented data structures (think lists, data transfer objects, etc.) and object-oriented components (think of a business service, an HTTP server, or framework). This isn’t the revolution promised in the first era, but it makes good uses of objects and encapsulation to improve upon procedural programming.

Era 3: Objects and functional programming (2005-now)

Scala started exploring object-orientation and functional programming more in detail around 2005. Both paradigms might look contradictory at first (object-oriented programming is about mutability, functional programming about immutability). But both blend in actually quite well, at least the basics like list transformations (think map and flatMap). This isn’t actually a big suprise, given that lambda has been there from the beginning in Smalltalk, it’s just that Java didn’t have them initially.

This exploration continued with Kotlin and with Java itself. Java finally added lambda to the language and there are many more explorations going in with incubating projects. For instance pattern matching and OO play along quite well too. Developpers found an appropriate balance between immutable constructs and mutable ones.

What we have now, is what we can call “FP in the small, OO in the large”. Objects shine at encaspulating whole components or services. The object-oriented data structures that are used internally don’t need necessary to be mutable, though. They can be transformed and manipulated using idioms from functional programming.

It’s I think we’re we stand now, and where we will stay for a few more years until we’ve fully explored this space.

More

Software Architecture

Metaphors in Software Engineering

One metaphor frequently used in the field of software development is the metaphor of software architecture. The architecture of a software system consists, like the architecture of building,  of the main structures of the systems.

For a software system, the term “structure” could mean structures that are logical or conceptual. They don’t necessary match with tangible system boundaries. But the term “structure” does mean that the metaphor is biased towards expressing the static aspects of the software system.

Unlike buildings, software systems have also dynamic aspects. Information flows in a software system, and systems communicate with each others. Therefore, other metaphors can be useful to explain the nature of software systems.

Here are a few that I find interesting.

A city

As said, the architecture metaphor is limited in that if focuses too much on the statics. The city metaphor is in this regard better, since it evoques simulateously static structures (roads, bridges, buildings) but also dynamic aspects (traffic flow, people living in the city). Good city planning deals with both. The metaphor can be used for a software system, but also for collections of software systems.

Enterprise architecture is the field of IT that addresses IT strategy at the enterprise level. The city metaphor is a good own for the enterprise architectture. Changes of IT strategy (for instance, moving to the cloud) impact many systems and take years to be achieved. They significantly and durably change the way software system are built for the enterprise. If Hausmann’s renovations gave a new face to Paris, moving to the cloud will give a new face to the IT of your enterprise. 

A garden

The architecture metaphor is also limited in that it conveys the impression that a software is built once, and then never changes. It may be true for a building, but isn’t for software systems. According to the laws of software evolution, a software system must constantly be maintained and adapted to the needs of their users, or it will become useless. As software systems are developped and grow, they tend to accumulate inconsistencies that must be actively removed. This is much like a garden, which must be constantly maintained, and bad weeds, which must be removed. 

It’s possible to convey something similar with the architecture metaphor too, since building suffer wear and tear. We speak sometimes of architecture erosion, to denote the degrading quality of the architecture. By the way, buildings do change over time, sometimes quite significantly.

A book

Software is expressed using programming languages and its source code consists of text. A software system can thus be compared to a book, albeit a very special one. You can’t read it linearly and everything is interlinked. But there is a sense of style in a given code base, and code can be more or less elegant. There is something arful to programming. Given that developpers spend a lot more time reading code than writing code, taking care of software as text makes sense. With development approaches like literate programming, developpers a supposed to write the source code like a story to explain their thoughts. It didn’t catch on, but still worth a look.

A living organism

A running software system can also be compared to a living organism: it needs energy to run and do something useful. In some way, functions of the runtime, like memory management or thread scheduling, can been seen as some form of metabolism. Interestingly, some software systems like blockchains are explicitly designed to have an inefficient metabolism and consume large amount of energy. A running software system has a health too, which indicates how well the system works. Millions of things can go wrong during run time, degrading its health and behavior. For instance a memory leak will over time degrade the performance of the system until it simply dies.  Some components of a software system have at run time multiple instances. A failure of one component doesn’t break the whole system, just like we can live with one kidney. A running software systems can be compromised by a hostile inputs, the equivalent of a pathogen. The immune system of a running software consists of mechanisms like SQL sanitization, managed memory, safe pointers, etc. which aim at making software more robust. Usually software systems do not reproduce, though. Except for software viruses.

An asset

The IT has long been seen as a cost center, detached from business units that are profit centers. With digitalization, the perception is changing. Software is the enabler for the business, and go hand in hand with it. It is an asset and generates value. But with software, more code doesn’t mean more returns. More code means more maintenance, and only some feature of the system might actually deliver value.

There are of course more metaphors. Just have a look at the links below. The city, the garden, and the book metaphor are somewhat popular. The metaphor with living organisms is surprisingly uncommon. The asset metaphor isn’t really a metaphor- more like a mindset. The architecture metaphor is sometimes critiqued, but if we assume that software development is an eingeering discipline, it’s the only metaphor that resonates with engineers. So it’s unlikely to change.

More