TOGAF: The Good Parts

TOGAF is a framework for enterprise architecture management. Enterprise architecture aims at aligning the business and IT to achieve the strategic goals of the enterprise. Enterprise architecture supports digital transformation in large enterprises.

The core of the TOGAF framework is the architecture development method (ADM).

In a nutshell, the method works as follows:

  • in the preliminary phase, you build up the enterprise architecture capability itself (that is, you establish and tailor the TOGAF framework)
  • the architecture work is triggered by architecture changes that go through the whole cycle.
  • Phase A-D work out the candidate architecture, which is decomposed in four architecture domains: business, application, data, and technology. Application and data architecture are grouped together into “information systems architecture” in the cycle.
  • The candidate architecture is solution-neutral. Up to this point, you identify changes to business processes, applications, interfaces, data models, platforms without defining a concrete implementation.
  • A more concrete solution is worked out in phases E and F. The selection of precise technologies, implementation architecture styles (microservices, streaming, etc.), or vendors come especially in these phases.
  • Phases E and F also cover planning with key stakeholders using an architecture roadmap and migration plan that define the work packages.
  • In phase G, the project is handed over to the implementation organization (e.g. an agile release train). Expectations about the outcome to deliver and quality are agreed in an “architecture contract” between the architecture and the implementation organization.
  • Phase H is a retrospective where improvements are formulated and kickstart a new cycle.

The cardinal sin to avoid is to jump from A to G, namely, from the business need to the implementation plan. We’ve all been there: the business has an idea and mandates a team to realize it, without looking left or right and how it would fit in the bigger context. This usually leads to specific solutions for specific problems. Over time, the architecture landscape becomes fragmented and inconsistent. The goal of enterprise architecture is precisely to avoid this. Business needs should be supported by a consistent architecture strategy.

This goal of consistency is supported in TOGAF with the concept of building blocks. The architecture consists of architecture building blocks (business, application, data, or technology) that can be used or reused. In phase B/C/D architects identify which building blocks have to modified, added, or removed for a given change. This is the gap analysis between the baseline and target architecture. In phases E and F, when the concrete solution architecture is worked out, solution architects identify solution building blocks to fulfill the requirements.

Part of TOGAF is also a content metamodel that define key entites to model the four architecture domains (business, application, data, or technology). It’s pretty generic but can be good starting point. You will probably have to refine it though, so that it becomes really useful (e.g. refine the technology metamodel to distinguish between plattform and frameworks).

These core concepts of TOGAF define a useful methodology to tackled complex architecture change. It’s the good parts.

From the perspective of agility, the framework is neither good nor bad. It will all depend on your implementation.

The framework is iterative in nature. Each architecture change goes through the cycle and is an iteration. How big the changes are, how long the cycles last, and how many iterations can run in parallel will depend on your implementation. Part of the preliminary phase is the idea to tailor the framework to your needs. You can implement the core idea in a bureaucratic manner with many deliverables and approvals. But you can also implement the core ideas in a lightweight manner with a few well chosen checkpoints. Similarly, you can partition the enterprise architecture work in “segments”. What they are and how big they are will depend on your implementation.

The bad parts of the framework are the useless ornaments around the core concepts. I can make a list:

  • The many deliverables expected to be produced along the cycle. I like the core concepts as long as they remain concepts. But TOGAF also defines a set of deliverables that probably are never implemented as such.
  • The template library documents predefined viewpoints that can be use to document the architecture. This level of details is mostly useless.
  • The content metamodel comes with two additional references architectures to model the technology and application domains. This complexifies the discussion about modelling without bring much benefits.
  • The framework comes with a set of techniques that can be employed to carry out the phases. What has been defined as technique or not seem rather arbitrary. The techniques can serve as inspiration to conduct real work, but I doubt they will be followed as such.
  • The framework defines a classification scheme for building blocks, ranging from generic to enterprise-specific, called the enterprise continuum. The value of the continuum as concept and its applicability are rather unclear.

These ornaments make the framework bigger with details, without being practical enough to be really useful. They mostly distract rather than help.

Architectures for Mobile Messaging

A project I’m working on involves changing the messaging technology for the delivery of realtime information to train drivers using iPad. This project made me interested in the various ways to design realtime messaging plattforms for mobile clients.

Unlike realtime messaging systems for web or desktop applications, mobile applications have to deal with the additional concern of unreliable connectivity. This unreliable connectivity is more subtle than I though. Here are for instance a few points to consider

  • no connectivity or poor connectivity (tunnel, etc.)
  • the device my switch from 5G to WLAN
  • connection breaks when app goes in the background
  • Different WLAN HotSpots (Androis, iOS) result in different behavior

You need to design your application to support these use cases correctly.

Here are some aspects of the communication that you need to consider

  • Does the client need to load some state upon connection?
  • Have updates a TTL?
  • Are messages broadcasted to several clients or unique for the clients?
  • Is message loss important or not?
  • Does the server need to know which clients are connected?
  • Do you have firewall between client and server?

Depending on the answers to these questions, you migth decide to establish a point-to-point onnection from the device to the backend. If you want to broadcast information to several clients you need to do this yourself in this case. You will also need to manage the sate in the backend yourself. Tracking the presence of the client is trivial, since there is one connection per client. Several technologies exist for this use case:

  • Server-Side Event
  • HTTP Long Polling
  • gRCP
  • WebSocket

You might otherwise decide to rely on a messaging system with publish-subscribe. The most common protocol for mobile messaging in this case is MQTT, but there are others. With a message broker, the broker takes care to broadcast message and persist the state according to the TTL. Tracking the presence of the client can be achieve with MQTT by sending a message upon connection and using MQTT’s “Last Will Testament” upon connection loss.

There are of course more details to take care when comparing both approaches, especially around state management. For instance, how to make sure that outdated messages are ignored.

We chose the latter option (MQTT) for our project, but I’m sure we could have achieved our goal with another architecture, too.

MORE

Apprently Uber and LinkedIn rely on SSE:

Beyond Events: the Stream Abstraction

In an event-driven system, the unit of abstraction is the event. An event is a notification about something happening in the system. Events can be processed and lead to other other events.

In systems relying on streams, the unit of abstraction is the stream. A stream is a sequence of events, finite but also possibly infinite. Rather than processing individual events, you manipulate streams.

If you simply want to react to each event, the difference is insignificant. For more complex situations, using streams makes it easier to express the processing.  Streams are manipulated with various operators, like map, flatMap, join, etc. Implementing windowing is for instance a trivial task with streams – just use the right operator – whereas it would be a complicated task using only events.

merge

One main use case for streams is to implement data pipelines. In this case we speak of stream processing. This is what Apache Flink and Kafka Streams are for. Stream processing is typically distributed on several machines to processe large quantities of data.  The processing must be fault-tolerant and accomodate the failures of individual processors. This means that such technologies have sophisticated approches to state management. In the case of Kafka Stream, part of the heavy lifting is delegated to Kafka itself. Durable logs enable the system to resume after a failure, if needed reprocessing some data a second time.

Streams can also be used within applications to locally process data. This is what RxJava and Akka Streams are for. This tend to be referred as reactive programming and reactive streams. You use reactive programming to process asynchronous data, for instance video frames that need to be buffered.  Rather than using promises or async/await to handle concurrency, you use streams.

There are many similarities between stream processing and reactive programming but also differences. In both cases, we find sources, streams, and sinks for events. In both case, you have issues with flow control. That is, making sure the producers and consumers can work at different paces. Since both use cases differ, the abstractions might differ, though. Streams in reactive programming supports for instance some form or exception handling, similar to regular java exceptions. Exception handling in stream processing is different. With reactive programming, buffering will be in-memory. With stream processing, buffering can be on disk (e.g. using a distributed log).

The stream, as an abstraction, is a relatively young one. It isn’t as well established as, say, relational databases. The terminology varies across products as well as concepts. The difference between stream processing and reactive programming is also not fully understood. For some scenario, the differences are irrelevant. As evidence that the field matures, some efforts to standardize the concepts have already started. The new java.util.Flow package is a standard API for sources (called publisher), streams (called subscription), sinks (called subscriber) in reactive programming. Alone, it doesn’t come with any standardized operator, though. This makes its usefullness at the moment limited to me. Project Reactor‘s aim is similar and it’s an implementation of the reactive streams specification that is embeddable is various framework, e.g. spring. Its integration in spring cloud stream effectively bridges the gap between reactive programming and stream processing.

The stream, as an abstraction, is very simple but very powerful. Precisely because of this, I believes it has the potential to become a core abstraction in computer science. It takes a bit of practice to think in terms of streams, and once you get it, you see possible applications of streams everywhere. So let’s stream everything!

More

When You Should Rewrite

When the architecture of a system starts to show its limits, it’s tempting to throw everything away and start from scratch. But a rewrite has challenges too. The existing software is a value-generating asset and must be maintained. The new architecture is unproven and comes with risks. Reaching feature parity can take years, and the rewrite turns also into an integration challenge to inteface the old and the new system. If a big bang approach is chosen, planing the switchover without data loss becomes a project on its own. These are just a few of the considerations,  far from an exhaustive list.

Joel Spolsky wrote in 2000 an influential article discouraging rewrites, calling a rewrite the “single worst strategic mistake” you can make. Many developpers know this article and it often cited. Developpers are generally wary of rewrites. I love this description from Tyler Treat:

“Rewrite” is a Siren calling developers to shipwreck. It’s tempting but almost never a good idea

Yet, many software systems are regularly rewritten, as seen by the numerous articles listed below. And many rewrites are successful.

Whether you should rewrite your project or not can only be answered by yourself (or your team). Too many factors impact such a decision to be turned into a decision algorithm. Often, to rewrite or not ist not a binary decision anyway. There are nuances, for instance, which components of the system to rewrite. How much of the old system do you need to replace to call it a rewrite?

Having been working on Smalltalk for some years, I can confirm you that you can go a long way without a rewrite. Indeed, the Smalltak images that we use today are in fact “ancestors” of the very images of the 80s. All changes have been pushed within the environment itself, without a rewrite, even without a restart (because the concept doesn’t exist in Smalltalk).

I expect to hear about a few more software rewrites in my career, because it’s inherently tied to software evolution. A software rewrite might have a negative connotations sometimes, for instance when it’s driven because of massive technical debt. But most software rewrites are driven by increasing requirements. You rewrite your system because you are asked to make it work beyond what it was intially intended for. Actually, it’s the price you pay if your system is too successfull.


MORE

Some stories about rewrites or significant rearchitecturing work that I liked:

OOP: past, present, future

Object-Oriented Programming (OOP) has been a mainstream programming paradigm since about 40 years now. That’s quite a bit of time. So it’s worth asking: how did the paradigm evolve over time? I would say, looking back, that there has been 3 eras.

Era 1: Modelling the world with objects (1980-1995)

The idea with the object-oriented paradigm is to model the world in objects. The poster child of the object paradigm from this era is Smalltalk, where everything is an object. Objects send messages to each other and live in a parmanent, persisted state. This approach is great to model stand-alone applications with GUI components. The problem with this approach is that it doesn’t work very well for busines entities. In Smalltalk everything is an object and everyting is persistent. In other language, regular programs are started and stopped. You only want to persist the business entites. Persisting a subset of the objects is possible with object databases. This is a challenging problem though, similar to the serialization of object graphs. Searching and navigating heaps of objects is also no so easy. There’s also no easy way to share the business entities across instances of the application. For these reasons, business applications have frequently relied on relational database to persist their state.

Era 2: Objects and enterprise applications (1995-2005)

Using relational database to persist object graph leads to the so called “object-relational impedance mismatch“. It manifest itself in the difficulty to have a rich domain model that is persistent at the same time. The simplest approach to reduce the mismatch is to have a simpler domain model – just data structure – that can be persisted easily. But this in turns means that you move towards a more procedural style of programing again. This style of programming is well capture in the pattern “anemic domain model” of Martin Fowler.

The Java Enterpise Plattform is a major technology of this era. The Java Enterprise Plattform embraced object-orientation with the concept of enterprise java beans (EJB). Prio to EJB3, entity beans and session beans where objects that were persisted by the application server itself, with a vague ressemblance to an object database. Every operations would be carried as a synchronous operation over the network. The technology proved however to be hard to use because of the associated network costs. Another mismatch. Starting with EJB3, entity beans were turned into regular objects persisted with an object-relational mapper.

Other approaches to object-orientation exists. Domain-driven design promotes rich domain model in accordance with the concept of “modelling the world as objects”. To solve the object-relational mismatch, the domain model is kept separate from the model used for persistence. So called repositories take care of dealing with the mismatch.

The actor pardigm can be seen as a special form of object-orientation where objects communicate asynchronously. Actors are stateful domain entites. This avoids the problem of networking but doesn’t provide out of the box a solution for persistence. Some way to solve it is through event sourcing of object serialisation.

The heavy use of inheritance is also a characteristic of this era. Object-orientation promised reuse, which we thought meant inheritance. This missed the point. Reuse is promoted with good interfaces, which doesn’t stricly needs inheritance. With time, we learned to use interfaces, inheritance, composition and parametric polymorphism (aka generics) in a sane way.

With the previous learnings, the use of object orientation stabilized to a form mixing object-oriented data structures (think lists, data transfer objects, etc.) and object-oriented components (think of a business service, an HTTP server, or framework). This isn’t the revolution promised in the first era, but it makes good uses of objects and encapsulation to improve upon procedural programming.

Era 3: Objects and functional programming (2005-now)

Scala started exploring object-orientation and functional programming more in detail around 2005. Both paradigms might look contradictory at first (object-oriented programming is about mutability, functional programming about immutability). But both blend in actually quite well, at least the basics like list transformations (think map and flatMap). This isn’t actually a big suprise, given that lambda has been there from the beginning in Smalltalk, it’s just that Java didn’t have them initially.

This exploration continued with Kotlin and with Java itself. Java finally added lambda to the language and there are many more explorations going in with incubating projects. For instance pattern matching and OO play along quite well too. Developpers found an appropriate balance between immutable constructs and mutable ones.

What we have now, is what we can call “FP in the small, OO in the large”. Objects shine at encaspulating whole components or services. The object-oriented data structures that are used internally don’t need necessary to be mutable, though. They can be transformed and manipulated using idioms from functional programming.

It’s I think we’re we stand now, and where we will stay for a few more years until we’ve fully explored this space.

More

Metaphors in Software Engineering

One metaphor frequently used in the field of software development is the metaphor of software architecture. The architecture of a software system consists, like the architecture of building,  of the main structures of the systems.

For a software system, the term “structure” could mean structures that are logical or conceptual. They don’t necessary match with tangible system boundaries. But the term “structure” does mean that the metaphor is biased towards expressing the static aspects of the software system.

Unlike buildings, software systems have also dynamic aspects. Information flows in a software system, and systems communicate with each others. Therefore, other metaphors can be useful to explain the nature of software systems.

Here are a few that I find interesting.

A city

As said, the architecture metaphor is limited in that if focuses too much on the statics. The city metaphor is in this regard better, since it evoques simulateously static structures (roads, bridges, buildings) but also dynamic aspects (traffic flow, people living in the city). Good city planning deals with both. The metaphor can be used for a software system, but also for collections of software systems.

Enterprise architecture is the field of IT that addresses IT strategy at the enterprise level. The city metaphor is a good own for the enterprise architectture. Changes of IT strategy (for instance, moving to the cloud) impact many systems and take years to be achieved. They significantly and durably change the way software system are built for the enterprise. If Hausmann’s renovations gave a new face to Paris, moving to the cloud will give a new face to the IT of your enterprise. 

A garden

The architecture metaphor is also limited in that it conveys the impression that a software is built once, and then never changes. It may be true for a building, but isn’t for software systems. According to the laws of software evolution, a software system must constantly be maintained and adapted to the needs of their users, or it will become useless. As software systems are developped and grow, they tend to accumulate inconsistencies that must be actively removed. This is much like a garden, which must be constantly maintained, and bad weeds, which must be removed. 

It’s possible to convey something similar with the architecture metaphor too, since building suffer wear and tear. We speak sometimes of architecture erosion, to denote the degrading quality of the architecture. By the way, buildings do change over time, sometimes quite significantly.

A book

Software is expressed using programming languages and its source code consists of text. A software system can thus be compared to a book, albeit a very special one. You can’t read it linearly and everything is interlinked. But there is a sense of style in a given code base, and code can be more or less elegant. There is something arful to programming. Given that developpers spend a lot more time reading code than writing code, taking care of software as text makes sense. With development approaches like literate programming, developpers a supposed to write the source code like a story to explain their thoughts. It didn’t catch on, but still worth a look.

A living organism

A running software system can also be compared to a living organism: it needs energy to run and do something useful. In some way, functions of the runtime, like memory management or thread scheduling, can been seen as some form of metabolism. Interestingly, some software systems like blockchains are explicitly designed to have an inefficient metabolism and consume large amount of energy. A running software system has a health too, which indicates how well the system works. Millions of things can go wrong during run time, degrading its health and behavior. For instance a memory leak will over time degrade the performance of the system until it simply dies.  Some components of a software system have at run time multiple instances. A failure of one component doesn’t break the whole system, just like we can live with one kidney. A running software systems can be compromised by a hostile inputs, the equivalent of a pathogen. The immune system of a running software consists of mechanisms like SQL sanitization, managed memory, safe pointers, etc. which aim at making software more robust. Usually software systems do not reproduce, though. Except for software viruses.

An asset

The IT has long been seen as a cost center, detached from business units that are profit centers. With digitalization, the perception is changing. Software is the enabler for the business, and go hand in hand with it. It is an asset and generates value. But with software, more code doesn’t mean more returns. More code means more maintenance, and only some feature of the system might actually deliver value.

There are of course more metaphors. Just have a look at the links below. The city, the garden, and the book metaphor are somewhat popular. The metaphor with living organisms is surprisingly uncommon. The asset metaphor isn’t really a metaphor- more like a mindset. The architecture metaphor is sometimes critiqued, but if we assume that software development is an eingeering discipline, it’s the only metaphor that resonates with engineers. So it’s unlikely to change.

More

Chasing the Perfect Technology

The goal of pretty much any framework/plattform that you use — from a PaaS offering to application server and everything in between — is to make you more productive by taking care of some technical complexity for you: “Focus on the business logic, not the technology”.

Frameworks and platforms speed up development so that you can ship faster. And it’s true that you can ship faster: You can now, with current technologies, build an internet-scale service, highly available, able to handles millions of transactions per seconds, in a few month. It would have been unimaginable one decade ago.

The peak of productivity is achieved when you master your stack completely. You can then spend significant time working on business feature with little friction around technology itself.

Sadly, the peak of productivity is rarely reached.

One of the reasons is that developers get bored too early. Once the technical groundwork is in place and you just have to use it, it becomes boring. It’s fun to set up a whole analytics pipeline to solve this first analytics problem. Using the exact same pipeline to solve another problem? Boring.

Go ask a developer to use your existing infrastrucutre and stack as is and simply implement new features. I bet they will be lukewarm if they don’t see any technological problem to solve, or at least some technology to learn.

I speak from experience. The project I’m working on is an application for which a dedicated platform was built. This platform provides all sorts of thing to write applications, ranging from messaging, message processing, fault tolerance, configuration management, job scheduling. You can reuse these buildings blocks to design new features. As long as features requires new combination of building blocks, it’s interesting. But once it feels like using the same pattern every time, it becomes boring, even if it’s actually the moment you’re the most productive.

What motivates developpers is leveraging technologies to solve a problem. They are interested in figuring out how to use technology for your problem, not actually having more time writing business logic. Engineers have studied computer science because they like technology more than other business domains.

Technology platforms and frameworks – app servers, cloud, data pipelines, web framework, etc. – are so amazingly complex that you will need to solve several problems with them before you feel like you master them. Also, even if you master the technologies individually, the combination of the technologies might pose some new challenges. At the same time, technology changes very fast. This is another reason why we rarely reach the peak  productivity: technologies change before we truly master them. Technology evolves fast and we’re always playing catch-up.

A VM is for instance way easier to deal with than physical hardware. Using VMs definitely improves productivity. But as soon as you have VMs you want to become elastic. And for this, you need a whole you set of tools to learn and master. Progress in technology takes the form of layers that piles on. When you’ve barely master the first layer comes already the second one. These new layers, once mastered, enable new jumps in productivity though.

Not reaching peak productivity isn’t in itself a problem, since productivity grows nevertheless. Curiosity is what makes us push technology. What’s interesting is to realize that productivity and curiosity are actually at odd. It’s because we are curious that we never truly master our technologies and don’t reach peak productivity. But it’s also because we are curious that productivity in the long term always increases.

More

In fact, we anticipate that there will soon be a whole generation of developers who have never touched a server and only write business logic.

Talk: High-available applications for rail control (@BATBern43)

Marc Hoffmann and I were invited to the 43th Berner Architekten Treffen (BATBern) on “Event-Driven Architectures”. We presented the architecture of the Rail Control System, especially the mechanisms for high-availability.

Here are the slides:

Great Articles on Software Engineering

Sometimes, I read an article, and some idea deeply resonates with me and makes a long lasting impression. It changes the way I approach some topic.

Fred Brooks’ essay “no silver bullet” was one of the very first article I read that had this effect. The concepts of esssential and accidental complexity are very powerfull, deeply resonate with me, and shaped the way I see software engineering. This essay is a classic because it had the same effect on many people.

But there are other many great articles that influenced me. Let’s recap some of them:

No silver bullet, Fred Brooks

A software system consists of essential and accidental (or implementation) complexity. We should reduce accidental complexity as much as possible, but essential complexity will still be the dominng factor.

The law of leaky abstractions, Joel spolsky

It’s very hard to devise abstractions that completely hide the underlying complexity. Often, you will need to understand some internal details no matter what.

Simple Made Easy, Rich Hickey

This is a great talk about complexity. The key takeaway is that simplicity comes from not mixing things together that shouldn’t. It’s independent of your prior knowledge. Easiness comes from habits and convention. It depends on prior knowledge.

Choose Boring Technology, Dan McKinley

A reminder that using shiny new tools isn’t always the best option and that established and mature tools has its place if they suffice to get the job done.

There is no now, Justing Sheehy

An exploration of the way to handle time in distributed systems, where there’s no global notion of time or consistency.

Beating the Average,  Paul Graham

A classic from Paul Graham where he described how using Lisp and macros gave the company an advantage over their competitors.

Life Beyond Distributed Transaction, Pat Helland

An articles about giving up distributed transactions to design internet-scale systems using simpler data models (e.g. key-value stores)

Everything You Know About Latency Is Wrong, Tyler Treat

In short: using average or percentile hides your outliers, which is an important signal to understand the real beahvior of the system.

A Note on Distributed Computing, S. Kendall et al.

An article explaining that trying to abstract remote boundaries is bound to fail.

Smalltalk: A Reflective Language, F. Rivard

A very nice explanation of Smalltalk and its reflective capabilities showing how to adapt the language to add pre/post conditions. The reification of the stack and the fact that the debugger is just a normal tool is also explained.

Reuse: is the dream dead?, Kirk Knoernschild

An exploration of the use/reuse paradox:  “Maximizing reuse complicates use”

Reflection on Trusting Trust, Ken Thompson

A wicked experiment on bootstraping

The Log: What Every Software Engineer Should Know About Real-time Data’s Unifying Abstraction, Jay Kreps

A fanstatic analysis of the distributed log as the basic building block to integrate real-time systems. So good, that it was later converted in a book: I love logs.

Most of these authors have written several other articles that are great as well.