ewernli

Wording Matters: Principles vs Practices

It struck me when reading Scaling the Practice of Architecture that people often use the term “principle” in a sloppy way:

There is a great deal I could write here about bad architectural principles but I’ll stick to the key aspects. Firstly, they are not practices. Practices are how you go about something, such as following TDD, or Trunk Based Delivery, or Pair Programming. This is not to say that practices are bad […] they’re just not architectural principles.

I’ve probable been using the term in a wrong way more than once. Principles don’t tell you exactly how to do something. They are just criterions to evaluate decisions. All things being equal, take the decision that fulfills the principle the most. Examples of well-known design principles are for instance

Single-responsibility principle
Keep it simple, stupid
Composition over inheritance

A practice, on the other hand, is a way of doing something. Examples of practices are:

Pair Programming
Shift left with CI/CD
Limit Work in Progress (WIP)

A lot of documents confuse the two. For instance, the SAFe Lean-Agile principle are actually mostly practices.

It could look like principles are for software design and practices are for software delivery. But you can have principles for software delivery, too. For instance, “maximize autonomy” could be a delivery principle. It doesn’t tell you how. It just tell you that if you have two options to design the organization, you should go the the one that maximizes autonomy. On the other hand, a software design practice could be to “model visually”.

Another confusion in this area come with another term similar to principles and practices: values. A value is a judgment of what we consider important. Usually they define behaviors and are then adjective (but “profit” could be a value yet isn’t an adjective). “Autonomy”, could be for instance a value. A value embodies implicitly the principle of favoring this value over others. For instance, if you value “autonomy”, you will automatically follow the principle “maximize autonomy”. If you adhere to a value, the corresponding principle comes for free.

Finally, there are “conventions” and “guideline”. Conventions tell you how to do things exactly and are mandatory. You can check if you adhere to a convention or not. This is unlike principles or practices, which have room for interpretation. A guideline is like a convention, but optional. Examples of convention or guidelines are:

Interfaces are versioned
Sanitize all inputs
Limit WIP to 3

Using a full example of value/principle/practice/guideline with in one area, we could have

value: resilience
principle: tolerate failures
practice: chaos testing
guideline: use tolerant reader

Granted, no matter how we try to distinguish the terms from one other, there will be some overlap in some cases. Natural language is messy. But I think it’s worth using the terms in the most appropriate ways if possible. It helps create a mental model that works. If you mix practices, principles, value and guidelines together, people might not notice immediately, but it creates a cognitive friction that makes it harder to actually apply underlying ideas.

Talk: Software Architecture in Pratice (@Unibe)

I was invited by Prof. Oscar Nierstrasz (my PhD advisor) to give a guest lecture a the university of Bern on the topic “Software Architecture in Practice”.

Software architecture in practice from ErwannWernli

What Makes a Good Microservice?

The microservice architectural style is almost a decade old!

The term was widely popularized with the article “microservices” from Fowler in 2014. It emerged as a consequence of cloud platform and using the cloud as a distributed operating system.

Many companies abandoned monolith around this time and paved the way towards microservice: Amazon, Netflix, Spotify, etc. Here’s a serie from Airbnb: Building Services at Airbnb

Over the next years (or decade), the industry learned the pitfalls of microservices, when the style makes senses and when it’s overengineered. Ultimately, it’s about balancing complexity.

There are cases when you don’t need microservices. DHH’s “majestic monolith” is valid counter-argument.

Ultimately, the question shouldn’t be “What makes a good microservice?”, but “What makes a good microservices architecture?” Microservices, per definition, don’t come alone.

SAFe: The Lean Mindset

An interesting aspect of the SAFe framework is that it tries to combine two agile mindsets. The first mindset is the iterative mindset of methods like Scrum. It’s a cornerstone of agile development and SAFe “scales” it from the team-level to the program-level, for instance with the PI Planning.

Another mindset in SAFe is the lean mindset. The lean mindset is not about iteration, but about optimising the flow of value.

Lean came initially from manufacturing where the goal is to (1) reduce the time to produce physical good, and (2) reduce the “inventory” needed in the process, and (3) reduce the “waste” produced during manufacturing. In manufacturing, managing inventory requires warehousing and logistics, this costs money. Materials that end up as waste cost money too but do not produce value. To reduce delivery time, each step in the delivery process must be optimised and wait time be reduced to the minimum.

These ideas can be translated to the software world if we consider that features under development are “inventory” and the development process is a pipeline that can be optimised. Features under development are “inventory” since they don’t produce value but must be managed. Waste is a bit harder to map but it represents all the unnecessary work that end up not being used (think of unused design document, analysis, etc.). The development pipeline can take many forms but is always a variation of define, build, verify, and release. The quicker a feature can transition in the pipeline the faster you produce value.

Lean in itself doesn’t require iteration. Iterations are needed to manage uncertainty and course-correct the product development in the face of new information. Lean is about optimising a delivery process. But the delivery process could be about the delivery of a similar item every time, like cars in the manufacturing world.

But Lean is also a great complement to iterative approaches like Scrum. In this case, the goal of the lean mindset is in a way to optimise the iteration speed. Rather than having several features with long delivery time, focus on few features and short delivery time.

SAFe emphasises the lean mindset with concepts like the continuous delivery pipeline and value stream mapping. Besides presiding over the process, the RTE are also charged to improve the flow of value in the organisation.

The lean mindset isn’t as established as the iterative mindset. I find it interesting that SAFe integrates it and promotes it. We conducted a value stream mapping session at work, and it was very enlightening. Thinking in waiting time, inventory, waste does indeed work in the software world, too.

It’s a simple way to highlight process and organisational issues. It gives clarity to what should be optimised and not get lost in organisation design. Chances are, if you want to reduce waiting time, you will have to solve a bunch of other problems first. The lean mindset positions these problems not as end in themselves, but as bottlenecks to short delivery time. It helps you prioritize these problems. It’s a bit like Test-driven Development (TDD). Making things testable requires that you figure out a good design first. But assessing testability is easier than assessing “good design”. In the case of Lean, minimising “waiting time” requires that you figure out a good organisation first, but measuring “waiting time” is easier than measuring “good organisation”.

Superficially Silly Ideas Can be Game-Changers

When Twitter appeared more than a decade ago, I though it was silly. I saw little value in a service that only allowed sharing 140-character long text messages. I registered on a bunch of social media platforms and created nevertheless a twitter account. Some years later, the only social media platform I’m actively using is… twitter.

There’s a lesson here for me and it’s that it’s hard to predict what will succeed. A lot of products can appear silly or superficial at first. They may appear so in the current time frame, but this can change in the future. Initially, twitter was full of people microblogging their life. It was boring. But it morphed in a platform that is useful to follow the news.

A startup like mighty can look silly now – why would you stream your browser from a powerful computer in the cloud? But as applications are ported to the web, maybe the boundary between thin client and server will move again.

We prefer to endorse project that appear profound and ethical, like supporting green energy, or reducing poverty. Product ideas that are silly or superficial don’t match these criterion and it’s easy to dismiss them. But innovation happens often because of such products. No matter how silly or superficial you think they are, if they gain traction, they need to solve their problem well at scale. These products are incubators for other technologies that can be used in other contexts. Twitter, for instance, open sourced several components. If Mighty gains traction, it might lead to new protocols for low-latency interactive streaming interfaces. An obvious candidate for such a technology could be set-top TV boxes.

These products might appears superficial at first and might lack the “credibility” of other domains, but here too, the first impression might be misguiding. A platform like twitter can support free speech and democracy (sure, there are problems with the platform, but it at least showed there are other ways to have public discourse). A product like Mighty might in turn make it more affordable to own computers for poor people, since it minimizes hardware requirements. Because these product don’t have an “noble” goal initially attached to them, doesn’t mean they don’t serve noble cause in the long term.

There are of course silly ideas that are simply silly and will fail. But the difference between products that are superficially silly and truly silly is not obvious. I took in this text the example of twitter and mighty. In retrospect, the case for twitter is clear. For mighty, I still don’t know. The idea puzzles me because it’s at the boundary.

More

https://slatestarcodex.com/2017/05/11/silicon-valley-a-reality-check/

Update 13.11.2022

Hacker News thread “I decided to stop working on Mighty” . So I guess it settles the case.

Beyond Lifestyle

Climate change is the crisis of the century. It’s the result of our capitalistic economy, which runs on fossil fuel. This message is mostly accepted by everybody. What is interesting however, is how it is interpreted.

The media and most people frame the problem of excessive consumption as a lifestyle issue. Switching to EV, traveling less, eating less meat, buying less, should address the problem. It’s true that our lifestyle and consumption habits are part of the problem. But the problem is also a lot more fundamental that this.

The infrastructure that we use to live, transit, or work have needed gigawatts of energy to be built. All this isn’t “lifestyle”. It’s mostly what we call “progress”. If we want to address climate change, we will need to reduce fossil fuel consumption everywhere, which goes deep in the fabric of modern society.

Focusing only on lifestyle misses a large part of the challenge of climate change. The whole society runs on fossil fuel. Significantly reducing our footprint can not be achieved by changing our lifestyle in the current society. It needs changing the society itself.

The chart that we should learn and discuss is this one (from ourwordindata.org):

Every good or service that that we use in our everyday life embodies gigawatts of energy to exist.

For centuries life was organized locally. People built house with local material and obtained food from local farming or husbandry. We now have a global economy with goods shipped around the globe. These goods are produced using many intermediaries, each transforming simple products into complex products. Most of us support in some direct or indirect way this global chain of production with our work (I for sure, working in the transportation industry).

There’s fundamentally only two ways to reduce our footprint: degrowth or decouple (or both). With “degrowth”, we reduce consumption and reduce intermediaries. With “decoupling”, we decouple consumption from (dirty) energy usage by electrifying everything and using clean energy source.

Both degrowth and decoupling represent radical changes to society. With degrowth, we obviously need to reinvent a society based on locality and less consumption. With decoupling, we need to rebuild our industries (house eating, factories, transportation, etc.) to embrace clean electricity.

A sound narrative about climate change should go beyond lifestyle issue. The awareness is still not there at the moment, but it will come.

Antifragile

The core thesis of “Antifragile”, a book from Nicholas Taleb, is that some things suffer from volatility – but some other things gain from volatility. We should avoid making things fragile, but instead antifragile.

Antifragile goes beyond robustness or resilience in the sense that things that are antifragile not only tolerate volatility, but benefit from it. Taleb claims there is no word for this, hence the neologism “antifragile”.

One recurring and intuitive example of an antifragile system is nature itself. Through volatility (mutation of species, random natural events, etc.) the system becomes better. Other examples in the book include: entrepreneurship, health, education, city-state, artisanship.

Behind the idea of antifragility is the idea of optionality. To be antifragile, you should have many options that you can use (but are not obliged to), depending on how the situation changes. If options are cheap, you can have many of them and reduce your risk exposure, or better, reverse it to benefit from volatility. With the right options, you can be robust against small volatility, but also rare events (“fat tails” / “Black Swan”).

That’s how Taleb made its fortune, he was an option trader. While many operations don’t turn a profit, some have occasionally large payoffs. The book is an generalisation of the concept of optionality to other areas of life.

Nature is antifragile, because it has many options due to the diversity of species, and the diversity of random mutations. Many natural events are inconsequential, but once in a while, a natural event arises that leads to change in the ecosystem. Entrepreneurship is antifragile, because of the diversity of companies. Many fail, but once in a while a new idea work on (as venture capitalists surely know). Your body is antifragile, because a small occasional stress on some function (muscle, nutrition) makes you stronger.

Antifragility and optionality both relate to diversity. Nature is diverse, hence anti fragile. Entrepreneurship is diverse, hence anti fragile. More generally, trial-and-error is antifragile, since it increases diversity. Being an artisan as opposed to have a very special role in BigCorp makes you more diversely employable, hence antifragile. A small stress on a system with diversity will force new pathways that exercise and reinforce diversity even more.

My main critique of the book is that this link is obvious but not really explored. Diversity is a simple and well established concept. It’s risk management 101. I understand that Taleb brings another perspective to the topic with optionality and antifragility (and rare events), but it’s also not completely different. Just another angle. The same can be said about the study of complex adaptive system. These systems are the class of systems of interest to Taleb. Yet he completely ignores this field of research.

Taleb warns in the book about interventionism. Many interventions fragilise systems by removing diversity. The education system tends for instance to format the way we think, but we need people that think out-of-the-box to progress. Big companies, of course, are all about standardisation and economies of scale. They are fragile to disruption.

Interventionism also focus on what we know, but ignore what what we don’t. This unknown hides events deemed improbable but with disastrous consequences. Only time will reveal them. Vivid examples are instant catastrophes like Chernobyl, but also catastrophe building up with time, like climate change.

Interventions with good intentions but negative effects are easy to find (medecine, foreign policy). There’s an existing word for it: iatrogenics. If the unknown is big and the reward small, sometimes doing nothing is better.

The subtitle of the German version of the book is « a guide for a world we don’t understand ». In a way, I prefer it to the English subtitle « things that gain from disorder ». The German subtitle emphasis this view of the world, where we don’t know things, or can’t understand them. We believe everything is quantifiable (especially risk) and explainable, but it’s not. It’s also not needed to live well in this world. You don’t need to know how your body works to realize that exercising is good (also referred as: Green lumber fallacy). You don’t need to know what bad could happen, just err on the side of precaution with optionality/diversity.

Taleb warns of modernism, too. Stick to the classics, since they have better chance to pass the test of time rather than to jump to latest bandwagon (also referred as: Lindy effect). Interestingly, this goes against the former praise for optionality and trial-and-error. Unfortunately, this tension is not really addressed in the book.

There’s a tension, because the line of reasoning that Taleb proposes works at two levels: the level of the system and the level of the individual actor in the system. For instance, your cells are fragile, but you body as a whole is antifragile. Companies are fragile, but a diverse economic system as a whole is antifragile. While he briefly mentions both levels, most of the book does not really distinguish between the two.

In the latter chapters, Taleb introduces ethics around antifragility. Simply put: you shouldn’t exploit the fragility of others to make yourself more antifragile. Risks should align with rewards (referred as the agency problem). It is ethical for an entrepreneur with “skin in the game” by taking risks himself to be rewarded with large payoffs if he succeeds. A middle manager taking no risk personally, but risking the pension funds of other citizens, doesn’t act ethically. The opposite of this behavior is a hero. A hero is someone taking risk not only in his interests, but in the interests of others.

Taleb sees himself as a philosopher, whose special brand of heroism is enlightening people about fragility, and calling out people that fragilize the system at the expense of others (if you see fraud, you should call fraud). The book is thus mixed with references to philosophy, discussion about fragility, and stories of Fat Tony and Nero Tulip. Both are two kinds of alter ego of Taleb. Fat Tony makes money without “understanding too much about the world” while Nero Tulip is erudite. At the end of the book, Fat Tony dies and leaves Nero with 20 millions. I suspect that Fat Tony represents here the first career of Taleb as a trader, and Nero Tulip his second career as a book writer / philosopher.

Taleb is arrogant as expected, but the book is enjoyable to read and I liked many of the ideas presented.

Alignment: Valuable Yet Costly

Few years ago, I would have described a good organization as one where everyone is on the same page. By it, I would have meant exactly on the same page. I realize now that I was wrong. You don’t need to be perfectly on the same page. Being mostly on the same page is enough, and a little bit a chaos is ok.

Engineers are very well positioned to understand why: to be on the same page you need to coordinate, and coordination is expensive. This holds for actors in a software system (threads, processes) but also actors in an organization (person, teams, units). Coordinating between actors takes time, and as such slows the system. You should first try to design your system so that the need for coordination is reduced, and then if necessary, balance coordination with consistency (being on the same page).

The analogy works surprisingly well (maybe it’s not an analogy but a property of system in general?). Take optimistic locking in software systems: it’s a tradeoff between consistency and performance. Rather than lock the resource on each change, you only check when you do the final write if you’ve been working on the most up to date information. If not, you do a retry. In this case, there’s a performance hit, but overall the system is faster this way. The equivalent in an organization would be to accept that some people somewhere have outdated information. They will work based on this outdated information until a synchronization point happens and they realized the information is outdated. Some work will have to be corrected or redone. It may be upsetting, but should happen rarely.

The art of organization design is to reduce coordination and when needed use the right synchronization points. The goal is to prevent catastrophic mistakes. Some inconsistencies here and there, if timely resolved and with small consequences, are fine. Do not synchronize on everything (it’s way too expensive) but synchronize often enough to keep the risks small. Prefer many small risks than looming, large big risks.

There are lots of patterns in software system to synchronize and coordinate actors in the system. There are also a lot of patterns to synchronize and coordinate actors in an organization: all-hand sessions, company memo, internal trainings, review boards, formal processes, team meetings, etc.

Interestingly, software systems and organizations have different profiles when it comes to the tradeoffs between consistency and speed. For software systems, relaxing consistency beyond simple techniques like optimistic locking is usually hard. Transactional systems are still a lot easier to build than systems with relaxed consistency. On the other hand, an organization will always work with relaxed consistency somehow: it’s impossible for an organization to update the “collective brain” in a transaction. It’s the nature of people to misunderstand information, forget things, or simply take vacations or be sick.

Speaking of coordination and alignment, Elon Musk put it like this:

“Every person in your company is a vector. Your progress is determined by the sum of all vectors.” – Elon Musk.

What this analogy does not consider is the time needed to align. If lots of time is lost on coordination, the vectors are smaller. You then have to choose between an expensive perfect alignment, or some inexpensive imperfect alignment. Given that organizations constantly course-correct, vectors accumulate projects after projects (or task after task) and there are plenty of opportunities to adjust the alignment, even each time in an imperfect manner. This is why in a good organization, a little bit of chaos is ok.

Data Lock-Out: Understanding the Risks

My computer died a few days ago. Fortunately, I had a backup and could restore my data without problem on another laptop. Still, I’ve been wondering in the meantime: what if the restore hadn’t worked? How easily could I be locked out of my data ?

I have data online and data offline. My online data are mostly stored by google. If say, my account is compromised and due to a misbehavior from the hacker, my account is disabled. Would I ever be able to recover my online data? Not sure.

My data offline are stored on the harddrive, which I regularly backup with time machine. If a ransomware encrypts all my data, the backup shouldn’t be affected. Unless the ransomware encrypts slowly over months, without me noticing, and suddenly activates the lock out. Am I sure ransomeware don’t work like this? Not sure.

My laptop suffered a hardware failure. It hanged during booting, and no safe booting mode made it through. The “target disk” mode seemed still to work, though. It would have been a very bad luck, to not be able to access either the data on the harddisk or the backup. Both should fail simultaneously. But can we rule out this possibility? Not sure.

Harddisks and backup can be encrypted with passwords. I don’t make use of this option because I believe it could make things harder in case I have to recover the data. I could for instance have simply forgotten my password. Or some part could be corrupted. Without encryption I guess the bad segment can be skipped; with encryption I don’t know. Granted, these are speculative considerations. But are they completely irrational? Not sure.

Connecting my old backup to the new computer turned out to be more complicated than I thought. It involved two adapters: one for firewire to thunderbolt 2 adapter and one thunderbolt 2 to thunderbolt 4 adapter. Protocol and hardware evolve. With some more older technology, could it have turned out to be impossible to connect it to the new world? Not sure.

The probability of any of these scenario happening is small. It would be very bad luck and in some case would require multiple things to go wrong at once. But the impact would be very big—20 years of memory not lost, but inaccessible. There’s no need to be paranoid, but it’s worth reflecting on the risks and reduce the exposure.

More:

The Algorithm flags your Gmail account for suspicious activity and locks you out. But you did nothing suspicious and this is a *very* important account.

Even though Gmail is free, does it offer some sort of paid support to recover your account? Or is this it: lost forever?
— Gergely Orosz (@GergelyOrosz) March 18, 2022

The Superpower of Framing Problems

Some problem we work on a concrete. They have a clear scope and you know what has to be solved exactly. Sometimes, problems we need to address are however muddy, or unclear.

When something used to work, but doesn’t work any more, the problem is clearly framed: the thing is broken and must be repaired. However, if you have someting like a “software quality problem”, the problem isn’t clearly framed. Quality takes many form. It’s unclear what you have to solve.

To explore solutions you need first to frame the problem in a meaningful way. With this frame in place, you can explore the solution space and check how well the various solutions solve the problem. Without a proper frame, you might not even be able to identify when you have solved your problem, because the problem is defined in such a muddy way.

The “quality problem” mentionned previsouly could be reframed more precisely for instance as a problem or reliability, usability, or performance. It could be framed in terms of the number of tickets open per release, or about the time it takes to resolve tickets.

Depending on how you frame your problem, you will find different solutions. Using the wrong frame limits the solution space, or in the worst case, means you will solve the wrong problem. It’s worth investing the time to understand the problem and frame it correctly.

If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and five minutes thinking about solutions.– Albert Einstein

I’ve talked up to now about framing problems. Framing does however work even in a broader sense and can be used each time there is a challenge or an open question. Each time you should come up with a solution, there is some framing going on.

Something interesting about framing is, that in itself, it isn’t about proposing a solution. It’s about framing the solution space. As such, people are usually quite open to reframing problems or explore with new frames. Whereas if you propose solutions, you can except heated discussions, when it’s only about framing, usually the friction with other people is pretty low. While framing in itself is not a solution, it does however impact the solution that you will find. When people don’t agree on some solution, usually, people have different implicit frames for the problem. Working on understanding the frames is sometimes more productive than debating the solutions themselves.

A second thing interesting about framing is that you don’t need to be an expert in the solution to help framing problems. You need to be a an expert in the solution space, but not the actual solution. Going back the the example of “software quality problem”, you can help with framing if you know about software delivery in general. You don’t need to be a cloud expert or or process expert. This means that good framing skills are more transferable than skills about specific solutions.

I wrote long time ago about using breadth & depth to assess whether a thesis we good. In essence, this is a specific frame for the problem of thesis quality. Finding good frames for problems helps in many other cases. Framing problems is a great skill to learn.