TOGAF: The Good Parts

TOGAF is a framework for enterprise architecture management. Enterprise architecture aims at aligning the business and IT to achieve the strategic goals of the enterprise. Enterprise architecture supports digital transformation in large enterprises.

The core of the TOGAF framework is the architecture development method (ADM).

In a nutshell, the method works as follows:

  • in the preliminary phase, you build up the enterprise architecture capability itself (that is, you establish and tailor the TOGAF framework)
  • the architecture work is triggered by architecture changes that go through the whole cycle.
  • Phase A-D work out the candidate architecture, which is decomposed in four architecture domains: business, application, data, and technology. Application and data architecture are grouped together into “information systems architecture” in the cycle.
  • The candidate architecture is solution-neutral. Up to this point, you identify changes to business processes, applications, interfaces, data models, platforms without defining a concrete implementation.
  • A more concrete solution is worked out in phases E and F. The selection of precise technologies, implementation architecture styles (microservices, streaming, etc.), or vendors come especially in these phases.
  • Phases E and F also cover planning with key stakeholders using an architecture roadmap and migration plan that define the work packages.
  • In phase G, the project is handed over to the implementation organization (e.g. an agile release train). Expectations about the outcome to deliver and quality are agreed in an “architecture contract” between the architecture and the implementation organization.
  • Phase H is a retrospective where improvements are formulated and kickstart a new cycle.

The cardinal sin to avoid is to jump from A to G, namely, from the business need to the implementation plan. We’ve all been there: the business has an idea and mandates a team to realize it, without looking left or right and how it would fit in the bigger context. This usually leads to specific solutions for specific problems. Over time, the architecture landscape becomes fragmented and inconsistent. The goal of enterprise architecture is precisely to avoid this. Business needs should be supported by a consistent architecture strategy.

This goal of consistency is supported in TOGAF with the concept of building blocks. The architecture consists of architecture building blocks (business, application, data, or technology) that can be used or reused. In phase B/C/D architects identify which building blocks have to modified, added, or removed for a given change. This is the gap analysis between the baseline and target architecture. In phases E and F, when the concrete solution architecture is worked out, solution architects identify solution building blocks to fulfill the requirements.

Part of TOGAF is also a content metamodel that define key entites to model the four architecture domains (business, application, data, or technology). It’s pretty generic but can be good starting point. You will probably have to refine it though, so that it becomes really useful (e.g. refine the technology metamodel to distinguish between plattform and frameworks).

These core concepts of TOGAF define a useful methodology to tackled complex architecture change. It’s the good parts.

From the perspective of agility, the framework is neither good nor bad. It will all depend on your implementation.

The framework is iterative in nature. Each architecture change goes through the cycle and is an iteration. How big the changes are, how long the cycles last, and how many iterations can run in parallel will depend on your implementation. Part of the preliminary phase is the idea to tailor the framework to your needs. You can implement the core idea in a bureaucratic manner with many deliverables and approvals. But you can also implement the core ideas in a lightweight manner with a few well chosen checkpoints. Similarly, you can partition the enterprise architecture work in “segments”. What they are and how big they are will depend on your implementation.

The bad parts of the framework are the useless ornaments around the core concepts. I can make a list:

  • The many deliverables expected to be produced along the cycle. I like the core concepts as long as they remain concepts. But TOGAF also defines a set of deliverables that probably are never implemented as such.
  • The template library documents predefined viewpoints that can be use to document the architecture. This level of details is mostly useless.
  • The content metamodel comes with two additional references architectures to model the technology and application domains. This complexifies the discussion about modelling without bring much benefits.
  • The framework comes with a set of techniques that can be employed to carry out the phases. What has been defined as technique or not seem rather arbitrary. The techniques can serve as inspiration to conduct real work, but I doubt they will be followed as such.
  • The framework defines a classification scheme for building blocks, ranging from generic to enterprise-specific, called the enterprise continuum. The value of the continuum as concept and its applicability are rather unclear.

These ornaments make the framework bigger with details, without being practical enough to be really useful. They mostly distract rather than help.

Talk: Centralized/Decentralized (@SBB DevDay’22)

Each year, I try to give a talk at the internal SBB Developer Conference, called “SBB DevDay”). The previous years, I gave talks about concrete technology challenges. This year I tried something new and gave a talk about development methods and practices.

I consolidated my experience and insights of the past 6 year as lead architect into a useful framework to think about software architecture governance. This perspective mixes traditional agile concepts with concepts of choice architecture.

Use and Misuse of Technologies

Technology people come in two flavours: generalist and experts. I’m definitively a generalist. I’ve a good grasp of the core concepts being the technologies we use. I’m however lacking the expertise about the details of using them. For this, I rely on experts.

This lack of expertise of the details may even turn out to be an advantage in my position. Technologies have core concepts that support some primary technology use case. With enough knowledge about technical details, it’s possible to use the technology to support other use cases. But it’s almost never a good idea in the long term. The implementation will be fragile to subtle changes in details of the technology, and few people in the organization have enough expertise to maintain such an implementation. If your use case is not supported easily, rethink your problem at some other level. Not knowing too much about the technology means I’m limiting its use to what’s maintainable.

The last example of this kind that I encountered was about container dependencies on openshift. In our current system (not openshift-based), we start the containers using “run levels”. Such a concept doesn’t exist in openshift. You could recreate something similar using init containers and some deep technical knowledge, but it wouldn’t be trivial. Rather than misusing the technology, we will have to solve our problem at another level. The containers should be resilient to random startup order at the application-level.

Other examples from the past include details of object-relational mapping or distributed caching. Here too I believe that beeing “too smart” about using the technology isn’t good in the the long term. It’s better to stick to the common use cases and change the design at the application-level if necessary.

Sometimes some part of the design may leverage deep technological details. I’m not completely forbidding it. Used in strategic places and encapsulated in proper technology component, this is a viable strategy. We have for instance engineered a library for leader election based on the solace messaging middleware, that relies on a deep expertise of the semantics of queues in solace. This was a strategic decision and the library is maintained by a team of technology experts with proper knowhow and the support from solace itself. But such situations should be exceptions rather than the norm.

It’s hard to resist the appeal of a clever technology-based solution, but as engineers we absolutely should. When we misuse technologies, we paint ourselves in a corner.

What Is It Like to Be a Robot?

In “Metazoa“, Peter Godfrey-Smith explores the rise of consciousness in animals – from simple multicellular organisms to invertebrates like us.

Consciousness is concept that’s not so easy to capture. It’s about a sense of self, about a perception of the environment and oneself, about a subjective experience of the world. When does an animal qualify as conscious? Godfrey-Smith postulates that consciousness is a spectrum, not something one has or doesn’t. The analogy he uses for this is sleeping, or the state right after waking up. We are conscious, but with a different level of consciousness as when fully awake.

The nature of consciousness can be explored by taking extreme positions:

  • can you be conscious without any perception of the environment (a “pure mind”)?
  • does reacting to what happens around you without any emotion qualify a conscious?
  • do you need to have a nervous system and feel pain to be conscious, or is having a mood enough?
  • could you be conscious, but act indistinguishably from as an unconscious animal?

I would have described consciousness as being aware of one’s own existence, something related to mortality, and rather binary. Godfrey-Smith equates consciousness more to having a sense of self and feelings, which makes it something less demarcated. He’s using consciousness more like “awareness“, whereas I would use it more like “self-awareness“. (That said, even self-awareness isn’t maybe so binary. Between being aware of deadly dangers and being aware of your own existence, it’s hard to say when we transition from instinct to consciousness.)

The book focuses on the relationship between senses and consciousness. Godfrey explains in the book how various animals sense the world and which kind of consciousness they might have. Some animals have antennas (Shrimps), some have tentacles (Octopus), some feel water pressure (fish). Many animals have vision, but the eye structure can differ. Some animals feel pain (mammals, fishes, molluscs) , but some don’t (insects) – it’s however not so clear to define when pain is felt or not. Not feeling pain doesn’t mean the animal is unaware of body damage, just like you don’t feel pain for you car but notice very well when something is broken when driving.

The book reminded me of “What it’s like to be robot?” from Rodney Brooks. This article, unsurprisingly, references the previous book from Godfrey-Smith “Other Minds”. The article from Rodney makes parallels between the perception of octopus and artificial intelligence systems. Many of the questions raised by Godfrey-Smith about the animal world can indeed be translated directly to the digital world. Computer systems have sensors, too. The have rules to react to inputs and produce outputs. They can learn and remember things, and develop an individual “subjective” perception of the world. They don’t “feel” pain, but can be aware of malfunctions in their own system. Does this qualify as a very limited form of consciousness?

The book touches at the end on the question of artificial intelligence, but very superficially. Rather than wondering whether an artificial intelligence could be conscious, he focuses on refuting the possibility of human-like artificial intelligence. His argument is basically that neural networks do model only a subset of the brain’s physical and chemical processes and can’t thus match human intelligence (there are other physical and chemical processes at play in the brain besides synapse firing). He also argues that an emulation of these processes still wouldn’t cut it, since it wouldn’t be the real stuff.

Artificial intelligence will not have a human-like intelligence, though. Each system (biological or digital) has its own form of intelligence. Because of his anthropomorphism of artificial intelligence, Godfrey-Smith doesn’t explore the alley of consciousness in AI systems much deeper. This is unfortunate, because with his consciousness-as-spectrum approach, it would have been an interesting discussion.

More

Practices vs Principles

It struck me when reading Scaling the Practice of Architecture that people often use the term “principle” in a sloppy way:

There is a great deal I could write here about bad architectural principles but I’ll stick to the key aspects. Firstly, they are not practices. Practices are how you go about something, such as following TDD, or Trunk Based Delivery, or Pair Programming. This is not to say that practices are bad […] they’re just not architectural principles.

I’ve probable been using the term in a wrong way more than once. Principles don’t tell you exactly how to do something. They are just criterions to evaluate decisions. All things being equal, take the decision that fulfills the principle the most. Examples of well-known design principles are for instance

  • Single-responsibility principle
  • Keep it simple, stupid
  • Composition over inheritance

A practice, on the other hand, is a way of doing something. Examples of practices are:

  • Pair Programming
  • ​​​​​​​Shift left with CI/CD
  • Limit Work in Progress (WIP)

A lot of documents confuse the two. For instance, the SAFe Lean-Agile principle are actually mostly practices.

It could look like principles are for software design and practices are for software delivery. But you can have principles for software delivery, too. For instance, “maximize autonomy” could be a delivery principle. It doesn’t tell you how. It just tell you that if you have two options to design the organization, you should go the the one that maximizes autonomy. On the other hand, a software design practice could be to “model visually”.

Another confusion in this area come with another term similar to principles and practices: values. A value is a judgment of what we consider important. Usually they define behaviors and are then adjective (but “profit” could be a value yet isn’t an adjective). “Autonomy”, could be for instance a value. A value embodies implicitly the principle of favoring this value over others. For instance, if you value “autonomy”, you will automatically follow the principle “maximize autonomy”. If you adhere to a value, the corresponding principle comes for free.

Finally, there are “conventions” and “guideline”. Conventions tell you how to do things exactly and are mandatory. You can check if you adhere to a convention or not. This is unlike principles or practices, which have room for interpretation. A guideline is like a convention, but optional. Examples of convention or guidelines are:

  • Interfaces are versioned
  • Sanitize all inputs
  • Limit WIP to 3

Using a full example of value/principle/practice/guideline with in one area, we could have

  • value: resilience
  • principle: tolerate failures
  • practice: chaos testing
  • guideline: use tolerant reader

Granted, no matter how we try to distinguish the terms from one other, there will be some overlap in some cases. Natural language is messy. But I think it’s worth using the terms in the most appropriate ways if possible. It helps create a mental model that works. If you mix practices, principles, value and guidelines together, people might not notice immediately, but it creates a cognitive friction that makes it harder to actually apply underlying ideas.

SAFe: The Lean Mindset

An interesting aspect of the SAFe framework is that it tries to combine two agile mindsets. The first mindset is the iterative mindset of methods like Scrum. It’s a cornerstone of agile development and SAFe “scales” it from the team-level to the program-level, for instance with the PI Planning.

Another mindset in SAFe is the lean mindset. The lean mindset is not about iteration, but about optimising the flow of value.

Lean came initially from manufacturing where the goal is to (1) reduce the time to produce physical good, and (2) reduce the “inventory” needed in the process, and (3) reduce the “waste” produced during manufacturing. In manufacturing, managing inventory requires warehousing and logistics, this costs money. Materials that end up as waste cost money too but do not produce value. To reduce delivery time, each step in the delivery process must be optimised and wait time be reduced to the minimum.

These ideas can be translated to the software world if we consider that features under development are “inventory” and the development process is a pipeline that can be optimised. Features under development are “inventory” since they don’t produce value but must be managed. Waste is a bit harder to map but it represents all the unnecessary work that end up not being used (think of unused design document, analysis, etc.). The development pipeline can take many forms but is always a variation of define, build, verify, and release. The quicker a feature can transition in the pipeline the faster you produce value.

Lean in itself doesn’t require iteration. Iterations are needed to manage uncertainty and course-correct the product development in the face of new information. Lean is about optimising a delivery process. But the delivery process could be about the delivery of a similar item every time, like cars in the manufacturing world.

But Lean is also a great complement to iterative approaches like Scrum. In this case, the goal of the lean mindset is in a way to optimise the iteration speed. Rather than having several features with long delivery time, focus on few features and short delivery time.

SAFe emphasises the lean mindset with concepts like the continuous delivery pipeline and value stream mapping. Besides presiding over the process, the RTE are also charged to improve the flow of value in the organisation.

The lean mindset isn’t as established as the iterative mindset. I find it interesting that SAFe integrates it and promotes it. We conducted a value stream mapping session at work, and it was very enlightening. Thinking in waiting time, inventory, waste does indeed work in the software world, too.

It’s a simple way to highlight process and organisational issues. It gives clarity to what should be optimised and not get lost in organisation design. Chances are, if you want to reduce waiting time, you will have to solve a bunch of other problems first. The lean mindset positions these problems not as end in themselves, but as bottlenecks to short delivery time. It helps you prioritize these problems. It’s a bit like Test-driven Development (TDD). Making things testable requires that you figure out a good design first. But assessing testability is easier than assessing “good design”. In the case of Lean, minimising “waiting time” requires that you figure out a good organisation first, but measuring “waiting time” is easier than measuring “good organisation”.

Silly Product Ideas that Win

When Twitter appeared more than a decade ago, I though it was silly. I saw little value in a service that only allowed sharing 140-character long text messages. I registered on a bunch of social media platforms and created nevertheless a twitter account. Some years later, the only social media platform I’m actively using is… twitter.

There’s a lesson here for me and it’s that it’s hard to predict what will succeed. A lot of products can appear silly or superficial at first. They may appear so in the current time frame, but this can change in the future. Initially, twitter was full of people microblogging their life. It was boring. But it morphed in a platform that is useful to follow the news.

A startup like mighty can look silly now – why would you stream your browser from a powerful computer in the cloud? But as applications are ported to the web, maybe the boundary between thin client and server will move again.

We prefer to endorse project that appear profound and ethical, like supporting green energy, or reducing poverty. Product ideas that are silly or superficial don’t match these criterion and it’s easy to dismiss them. But innovation happens often because of such products. No matter how silly or superficial you think they are, if they gain traction, they need to solve their problem well at scale. These products are incubators for other technologies that can be used in other contexts. Twitter, for instance, open sourced several components. If Mighty gains traction, it might lead to new protocols for low-latency interactive streaming interfaces. An obvious candidate for such a technology could be set-top TV boxes.

These products might appears superficial at first and might lack the “credibility” of other domains, but here too, the first impression might be misguiding. A platform like twitter can support free speech and democracy (sure, there are problems with the platform, but it at least showed there are other ways to have public discourse). A product like Mighty might in turn make it more affordable to own computers for poor people, since it minimizes hardware requirements. Because these product don’t have an “noble” goal initially attached to them, doesn’t mean they don’t serve noble cause in the long term.

There are of course silly ideas that are simply silly and will fail. But the difference between products that are superficially silly and truly silly is not obvious. I took in this text the example of twitter and mighty. In retrospect, the case for twitter is clear. For mighty, I still don’t know. The idea puzzles me because it’s at the boundary.

More

Update 13.11.2022

How much Alignment Do you Really Need?

Few years ago, I would have described a good organization as one where everyone is on the same page. By it, I would have meant exactly on the same page. I realize now that I was wrong. You don’t need to be perfectly on the same page. Being mostly on the same page is enough, and a little bit a chaos is ok.

Engineers are very well positioned to understand why: to be on the same page you need to coordinate, and coordination is expensive. This holds for actors in a software system (threads, processes) but also actors in an organization (person, teams, units). Coordinating between actors takes time, and as such slows the system. You should first try to design your system so that the need for coordination is reduced, and then if necessary, balance coordination with consistency (being on the same page).

The analogy works surprisingly well (maybe it’s not an analogy but a property of system in general?). Take optimistic locking in software systems: it’s a tradeoff between consistency and performance. Rather than lock the resource on each change, you only check when you do the final write if you’ve been working on the most up to date information. If not, you do a retry. In this case, there’s a performance hit, but overall the system is faster this way. The equivalent in an organization would be to accept that some people somewhere have outdated information. They will work based on this outdated information until a synchronization point happens and they realized the information is outdated. Some work will have to be corrected or redone. It may be upsetting, but should happen rarely.

The art of organization design is to reduce coordination and when needed use the right synchronization points. The goal is to prevent catastrophic mistakes. Some inconsistencies here and there, if timely resolved and with small consequences, are fine. Do not synchronize on everything (it’s way too expensive) but synchronize often enough to keep the risks small. Prefer many small risks than looming, large big risks.

There are lots of patterns in software system to synchronize and coordinate actors in the system. There are also a lot of patterns to synchronize and coordinate actors in an organization: all-hand sessions, company memo, internal trainings, review boards, formal processes, team meetings, etc.

Interestingly, software systems and organizations have different profiles when it comes to the tradeoffs between consistency and speed. For software systems, relaxing consistency beyond simple techniques like optimistic locking is usually hard. Transactional systems are still a lot easier to build than systems with relaxed consistency. On the other hand, an organization will always work with relaxed consistency somehow: it’s impossible for an organization to update the “collective brain” in a transaction. It’s the nature of people to misunderstand information, forget things, or simply take vacations or be sick.

Speaking of coordination and alignment, Elon Musk put it like this:

“Every person in your company is a vector. Your progress is determined by the sum of all vectors.” – Elon Musk.

What this analogy does not consider is the time needed to align. If lots of time is lost on coordination, the vectors are smaller. You then have to choose between an expensive perfect alignment, or some inexpensive imperfect alignment. Given that organizations constantly course-correct, vectors accumulate projects after projects (or task after task) and there are plenty of opportunities to adjust the alignment, even each time in an imperfect manner. This is why in a good organization, a little bit of chaos is ok.

What’s My Exposure to Data Lock-out?

My computer died a few days ago. Fortunately, I had a backup and could restore my data without problem on another laptop. Still, I’ve been wondering in the meantime: what if the restore hadn’t worked? How easily could I be locked out of my data ?

I have data online and data offline. My online data are mostly stored by google. If say, my account is compromised and due to a misbehavior from the hacker, my account is disabled. Would I ever be able to recover my online data? Not sure.

My data offline are stored on the harddrive, which I regularly backup with time machine. If a ransomware encrypts all my data, the backup shouldn’t be affected. Unless the ransomware encrypts slowly over months, without me noticing, and suddenly activates the lock out. Am I sure ransomeware don’t work like this? Not sure.

My laptop suffered a hardware failure. It hanged during booting, and no safe booting mode made it through. The “target disk” mode seemed still to work, though. It would have been a very bad luck, to not be able to access either the data on the harddisk or the backup. Both should fail simultaneously. But can we rule out this possibility? Not sure.

Harddisks and backup can be encrypted with passwords. I don’t make use of this option because I believe it could make things harder in case I have to recover the data. I could for instance have simply forgotten my password. Or some part could be corrupted. Without encryption I guess the bad segment can be skipped; with encryption I don’t know. Granted, these are speculative considerations. But are they completely irrational? Not sure.

Connecting my old backup to the new computer turned out to be more complicated than I thought. It involved two adapters: one for firewire to thunderbolt 2 adapter and one thunderbolt 2 to thunderbolt 4 adapter. Protocol and hardware evolve. With some more older technology, could it have turned out to be impossible to connect it to the new world? Not sure.

The probability of any of these scenario happening is small. It would be very bad luck and in some case would require multiple things to go wrong at once. But the impact would be very big—20 years of memory not lost, but inaccessible. There’s no need to be paranoid, but it’s worth reflecting on the risks and reduce the exposure.

More: