Software Architecture

Conceptual Integrity at Scale

The central argument of the Mythical Man Month from Fred Brooks is that conceptual integrity is the most important consideration in system design, and that conceptual integrity will only be achieved if the design comes from one, or a few resonant minds.

I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

[…]

Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.

If you’ve been the creative force in a group work, you will have experienced these challenges. Core ideas are misunderstood, insoncistencies start to pop up, and the result is a patchwork.

For my part, I can confirm that consistency erodes quickly if you don’t pay close attention. Maintaining conceptual integriy is hard work.

This doesn’t happen because people are dumb, neglecting or malevolent. It happens because as soon as you specialize, you lose sight of the whole. Someone does a change here, someone a change there, and both changes end up not being fully consistent with each other.

Unfortunaltely, unlike Brooks suggests, doing all the design work alone is usually not realisitc.

With a good review culture you can scale your design team from one head to a few: let people design parts of the system even if their understanding of the whole system is lacunary, and have one central person review how well the contributions fit it.

It’s like having mutliple authors for an article but having one person in charge of doing a complete pass on the article at the end to ensure consistency.

But if you want to tackle bigger challenges, you will have to scale your design team even more.

Ensuring conceptual integrity at scale is hard because it requires not only scaling knowledge but also standardizing the decision making process.

This is what guidelines try to achieve. Guidelines encode the principles, maxims, constraints, and goals of the system in a way that different people reach similar decisions. It’s evidently impossible to encode the complete decision making process in guidelines, given that so much subjective, but they help achieve a basic overall consistency.

As for the subjectivity: just take one of your colleague and ask yourself “what would he decide?” You might have a hunch at his decision, but chances are, you don’t know enough about all the thinking that went in his previous decisions to predict this one accurately. If you do, well, you’re two “reasonant” minds, as Brooks would say.

If you know lots of people will be involved in the design process, you will need more than guidelines and reviews. You will have to decompose the problem in parts that can be solved individually. Each part can be assigned one “mind”. The whole might not be fully consistent, but the solution at each level of abstraction will at least be consistent.

Following the newspaper analogy, a newspapers has an editor in chief that sets the tone of the writings and the overall orientation (these are guidelines). He or she will review the topics of the individual articles to make sure they fit in the issue of the newspaper, but he or she won’t edit every article himself (the parts).

No large system will be fully consistent (think of Microsoft Office, that our dear journalists might be using), but it doesn’t hurt too much, because no user will ever use all of the system.

Evolution will also bring some inconsistencies in the system. Moving from one system paradigm to the next is like moving from one local maxima to the next one. In between things will be worse, that is, less consistent. But if you think there’s a superior design paradigm for the whole system, it’s worth challenging the current one and see if there’s a path.

Fred Brooks is right that conceptual integrity is the most important aspect in system design. He’s also right that the more designers there are, the harder it is to ensure concistency. But for large systems that evolve, some inconsistencies are inevitable. Address them like other risks in your project.

Software Architecture

In Defense of Design Before Coding

Software design as a separate activity from implementation — “up front” design — got a bad press with agile methods.

Agile advocates say the design should be emergent. They say, design without coding is waterfall. It’s a waste of time.

I understand that you don’t want to design the whole system up front. But at the feature level, a bit of thinking before coding does miracles, I say.

My first argument is visible design. Looking at the code doesn’t reveal the whole design because code only shows the static structure. The design is more than that. To understand how the system works you must run it, but even then the sequencing of events is still invisible. If you want effective feedback on the design, you must make it visible. People that jump directly to code still end up sketching or drawing things for their colleagues to explain their design and get feedback. Designing up front makes the design visible up front.

My second argument is speed of iteration. Even with higher-level programming languages, there is a gap between the concepts and the implementation. There is some work needed to implement the thing for real and take care of all the details. Running the system in your your head, or on paper, to challenge the design enables faster iterations on the design. CRC sessions are for instance a nice way to explore the design space effectively, without coding yet.

My third argument is better reasoning. The code level is just one of the many abstraction levels you can use to reason about the system. When you’re trying to identify the main abstractions, what their responsibilities are, and how they play together, this abstraction level is often too low. One abstraction with a clear responsibility might map to several classes. There might be for instance a “scheduler” that will be implemented using several concurrency primitives. These are implementation details (althrough interesting ones!) irrelevant for now. Working at the code level forces you to think at one specific abstraction level. Working on paper enables you to choose the optimal abstraction level to work out the design.

And finally, my fourth argument is tracking rationale. The code defines how the system works but gives in itself little clues as to why it was designed so. Design is all about trade offs: what were the they? If you never learn to design up front on paper you will never learn to document software understandably, too. And without documentation later, the rationale will be lost.

You should design up front so far you can. Then switch to a computer.

How far you can design up front depends on your intellect and your knowledge of the problem domain. Learn to assess the confidence in your up front design correctly, and identify when to stop, since there lies the danger: too much time spent designing on paper something that doesn’t work. But some design up front has its place.

More

Technology

How Technology Evolves

We often take for granted the technology we have and forget that it’s the result of a tedious evolutionary process.

A Railroad Track is the Width of Two Horses is one of the first stories about the evolution of technology that I remember reading, maybe ten years ago. It rings more like a colorful story than a true historic account, but it nevertheless left an impression on me.

Later, doing research gave me a bette appreciation how of ideas evolve, cross-polinate and morph over time. True hindsights are rare. It’s a lot about tweaking existing ideas until the right form that works is found.

Here are some of the most engaging stories about technology history that I’ve read:

Oh boy, innovation is so a messy process.

Technology

Platforms and Innovation

I started my career writing flash applications. Then I moved to Java. Both are middleware technologies that abstract the underlying operating system and enable cross-platform interoperability. I’ve actually never wrote a professional application that relied directly on a specific operating system.

This was fine to me. “Write once, run everywhere” was great for productivity.

For the kind of applications I was developing, what these middleware stacks provided was enough. Maybe I occasionally wished that drag and drop between the application and its host system was better supported, but that’s it more or less. I didn’t really miss a deeper integration with the rest of the system.

These technologies were also innovative on their own. Flash enabled developers to create rich web applications back in a time when web sites were mostly static. The same was true of Java and its applets, even if the technology never really took off.

But middleware technologies also slow down innovation.

An operating system provider wants developers to adopt its new functionalities as quickly as possible, to innovate and make the platform attractive. Middleware technologies make such adoption harder and slower.

The official Apple memo “Thoughts on Flash” about not supporting Flash on iOS makes it very clear:

We know from painful experience that letting a third party layer of software come between the platform and the developer ultimately results in sub-standard apps and hinders the enhancement and progress of the platform.

The informal post “What really happened with Vista” gives similar arguments against middleware stacks:

Applications built on [cross-platform] middleware tend to target “lowest common denominator” functionality and are slower to take advantage of new OS capabilities.

For desktop applications, a good integration with the operating system was a plus, but not a killer. The drag and drop functionality I occasionally missed didn’t impact the whole user experience.

With mobile devices, everything is different.

Mobile applications are more focused and need to integrate on the device seamlessly–in terms of user experience, but also connectivity and power consumption. That’s what “Thoughts on Flash” was about.

Think of notifications. Notifications for desktop applications are nice, but not a killer. For a mobile application, how the application integrates with notifications makes the difference between success and failure. Notifications are becoming the heart of the smartphone experience. You don’t want there to suck.

Or think of ARKit, Apple’s upcoming augmented reality toolkit. Augmented reality hasn’t yet really hit the mass market and there is lots of potential there. If only, it will make our good old fashion ruler obsolete to measure distances. But such a toolkit relies on specific hardware (sensor, CPU, camera). You don’t want middleware there to slow down adoption.

Platforms diverge and sometimes converge. They diverge when exclusive capabilities are added and converge when a cross platform standard is adopted.

With HTML5 we have a good standard for regular applications with desktop-like features. The GMail mobile web application is for instance so well done, that I prefer it to the native iOS version. But you can only go that far with HTML5. If you want to push the envelope, you need to go native and use the full power of the platform.

For applications in the broader context of the digitalization (social media, artificial intelligence, internet of things) innovation at the platform level will be decisive.

The platform war will intensify.

More

Technology

10 Tips to Fail with Enterprise Integration

If you want to make enterprise integration needlessly complicated, follow these tips.

1. Model poorly

A poor model is always a nice way to make things more complicated than they should.

Examples: You can name thing badly. You can model everyting as strings (key, list, etc.). Or you can reuse overly generic abstractions in multiple contexts instead of defining one abstraction per context. Or you can expose a relational model instead of an entity model.

2. Use immature technologies

Whenever possible, use immature, non-standard, or inappropriate technologies to make the integration complicated.

Example: Don’t use XML but JSON. Its support in IDE is still weak, its semantics for the various numeric types is poor, it prevents proper code generation (for class-based language), and JSON-Schema is still a draft.

3. Assume the network is perfect

Assume the network is perfect. It has infinite bandwidth as well as zero latency. This is a classic for disaster. Ignore completely the reality of networking. If your interface is sound at the logical level, then it will be fine in production.

Examples: Don’t distinguish between the time of the event you model and the technical time when the message was sent or received–it doesn’t matter since latency is zero. Or send replies to individual requests on a topic and leave the burden of filtering out the irrelevant replies to the subscriber at the application level–it doesn’t matter since bandwith is infinite.

4. Make loads and updates asymmetric

It is common for an interface to publish updates on topics but also provide a mean for the consumer to load data upon startup. In such case, the system should work so that the same data are delivered to the consumer for loads and updates. To introduce subtle data inconsistencies, make it so that loads and updates don’t deliver the same data.

Example: If an entity has multiple status, do not publish all status changes per updates. This way, there is a discrepance between the data you obtain per load requests and per updates.

5. Make the system as stateful as possible

If you find a way to complicate state management, go for it.

Examples: Instead of publishing entities that are consistent, publish only the delta with what has changed. The consumer must carefully ensure that all deltas are applied in order. Or define requests that reference other requests, e.g. to implement paging. The provider will need to do some bookkeeping of the previous requests.

6. Leave the protocol vague

By defining the transport technology, the encoding, and the various messages that can go through your interface, most readers of the specification will have a good understanding of what the purpose of the interface is. So stop there. Don’t bother explaining the exact protocol with the assumptions about the order of messages or when a given message can be sent or not. This way, you leave the door open to non obvious misunderstandings.

Example: don’t specificy which requests can be used anytime and which should be used only occasionally after a restart or recovery.

7. Don’t properly version your interface

Your interface will need to change. Don’t provide proper versioning. This way, supporting multiple versions will be a pain.

Example: Use XML Namespaces, but don’t use it for versioning.

8. Redefine the semantics of data between versions

Do subtle changes to the meaning of the data, so that the semantics changes in a non obvious way.

Example: Redefine what “null” means for a certain attribute.

9. Don’t distinguish between endpoint and tenant

Your interface will be accessible through an endpoint that will probably be used from multiple consumer systems (“tenant”). Define SLA per endpoint, but not per tenant. This way you will need to deploy multiple endpoints to really guarantee SLA for specific consumers.

Example: provide a limit for the frequency of load requests at the endpoint-level, but independent of the consumer systems. If a consumer misbehaves, it will prevent all other consumers from loading data.

10. Ignore monitoring needs

Do not provide any meaningful way for the consumer to check whether the provider is healthy or not. Either the consumer will have to guess, or it will have to use feature not designed for monitoring to assess the system health.

Example: aggregate data from multiple decentralized subsystems and publish them via a centralized interface, but don’t provide any way for the consumer to figure out which subsystem is healthy or not.

More

Software Architecture

Why a Calendar App is a Great Design Exercise

To check if a salesman is good, one classic is the “Sell me this pen” test. To check if a software designer is good, I propose the “Design me a calendar app” test.

That was one of the topic we chose for the software engineering lab, and I loved the results.

There are several reasons why it works well as a design exercise:

Everybody can relate − The domain is easily understood and everybody can relate. Who hasn’t used a calendar app?

It’s easy but not so easy − Managing events that occur once and are short is easy. But it gets more interesting as soon as events are recurring (series), span multiple days, are entered in different time zones, or have rooms associated with them. The design becomes more complex not because independent features pile up, but because the complexity of the core model increases.

Time is messy − A lot of complexity in business software comes from the fact that business rules are “arbitrary”. They make sense at the business level because of processes, domain knowledge, etc. but it’s hard to capture some clear “logic” behind them in software. Introducing such a business domain for an exercise is possible, but takes time. On the other hand, every body knows the idiosyncrasies of the Gregorian calendar already. There is little “logic” behind February having only 28 days and occasionally 29. But, yes, it means a month might sometimes overlap exactly 4 weeks and not 5. Deal with it in you UI.

It’s not just the server − This design exercise raises interesting questions not only in the backend, but also the frontend. What’s the right model? How can we display it fast? What’s the expectation of the user when the start time of a meeting is changed: to shorten it or to move it? These questions don’t have to do with the technology stack. They are inherent to the product. For questions like the last one, I recommend reading The Math of Easy-to-Use from Terry Crowley, former head of development for Microsoft Office, including Microsoft Outlook. He knows about calendar apps.

In the The Mythical Man Month, Fred Brooks explains that one of his favorite interview question is “Where is next November?”.

I have long enjoyed asking candidate programmers, “Where is next November?” If the question is too cryptic, then, “Tell me about your mental model of the calendar.” The really good programmers have strong spatial senses; they usually have geometric models of time; and they quite often understand the first question without elaboration.

Mental models of time are cultural. In western societies, time flows from left to right; the past is behind us and the future in front of us. In other societies, it’s the way around. So I wouldn’t quite expect a specific answer to this question. But I would agree with Fred Brooks that a good ability to model time is a predictor of good design skills in general.

If you don’t get much from this exercise, it will at least make you more aware of the problems that exist dealing with time in computer programs and to use libraries properly. This is a valuable programming skill on its own. The system I’m working on (a train dispatching system) doesn’t work correctly during the night of the daylight saving time (DST) change in autumn, since time jumps back if the DST offsets aren’t accounted for. If you’ve designed a calendar app once in your life, you are aware of such pitfalls.

So, please, don’t design todo apps as exercise. Design calendar apps. It develops real design skills and will make real-world software less buggy.

More

Technology

Living in the Future

The world is constantly changing. From electricity to cars to television to the internet, most generations have seen at least one breakthrough.

This will continue, and it’s certain that my generation will witness another technological shift.

Interestingly, how we react to new technologies changes itself with time.  For a lot of new technologies, my first reaction was indifference, missing entirely the new possibilities the technology offered.

The iPhone? I thought it would be a flop. Facebook? I thought it would be a fad. Bitcoin? I thought it would crash.

It seems like I belong to the late majority rather than the early adopters. Maybe Douglas Adams has also a point:

I’ve come up with a set of rules that describe our reactions to technologies:

1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.

2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.

3. Anything invented after you’re thirty-five is against the natural order of things.

Since I’m certain to witness another change, I will have to adapt, whether I like it or not.

For instance, virtual reality might be a thing, after all. It seems to me very against the natural order of things right now, but actually it’s not much crazy than television back then.

First versions of new technologies always sucked. They were bulky, limited, slow, made just usable enough for a specific niche market. For virtual reality helmets, the gamers.

With widespread adoption, the usage can completely change, though. I’m writing this post on an iPhone using a third party app, after all. Maybe virtual reality is the future of shopping, who knows.

The talent is to foresee the potential of a mass market, which isn’t always obvious.

I think there is a world market for maybe five computers — Thomas Watson, 1943

Realizing that my ability to predict successful technology changes are as good as Thomas Watson, it’s interesting to try to see how innovators see the world.

According to Paul Graham, innovators “live in the future.” They are natural early adopters and their use of technology is so that they simply build what is missing to them.

An alternate formulation which I like is from Tim Urban: innovators have an accurate “reality box.” That is, unlike most people, whose understanding of the world and what technology enable reflects the common wisdom established 10 years ago, the innovator has an accurate and up-to-date understanding of the possibilities offered by technology. This make it obvious to create new products around these capabilities.

Will virtual reality turn out to be the future of shopping, or self driving cars become mainstream, or bitcoin establish itself as a the first digital currency? Whatever the next breakthrough will be, there’s an exiting time ahead.

So I’ve decided to be more open to new ideas and keep my reality box more accurate to assess them. But changing one’s way of reacting to new ideas is hard, just as well as predicting the future.

Wearing a smart watch is still something that doesn’t appeal to me. And it apparently doesn’t appeal to many other people either.

Organisation

10x

Fred Brooks started it all. In The Mythical Man Month, he quotes a study saying

individual difference between low and high performers can vary by an order of magnitude

Since then this myth of 10x productivity difference has persisted in our industry.

Nowadays it’s best seen in the use of words like rockstar, guru or wizard in job descriptions.

But is it really a myth, or reality?

It’s undeniable that individual differences exist. Not everybody can write an operating system kernel, a concurrent collection library, or cryptocurrency protocol. These achievements are examples of outstanding technical expertise.

Like in sports, the distribution of talent is skewed, and there are outliers that outperform others.

But here’s the catch: the 10x developer isn’t working 10x faster, he’s thinking differently. The 10x developer finds new way to address problems.

He doesn’t deal with complexity better. He finds way to avoid complexity altogether. Not occasionally, but systematically, as part of his work ethics.

A 10x developer is also a force multiplier. His actions make the work of several people easier. He inspires other to achieve excellence and clone his habits. The payoff can go above 10x.

So, myth or reality?

For me, reality. But such developers are very rare. Over the last 10 years I’ve only met one.

Links

Thinking

The Brain and Probabilities

The brain is a wonderful machine with an impressive computing power. We can make sense of complex information effortlessly and almost instantly. But it has one big flaw: it does not understand probabilities.

When presented with information, the brain tries to explain it by building a coherent story out of it. To do so quickly, it relies on some shortcuts, which largely ignore probabilities. So the story you get isn’t necessary the most probable, but instead the cheapest it could construct, as long as the story remains plausible.

One of the shortcuts is to trade availability of information for probability (availability heuristic): the information you can recall quickly is deemed more probable than other information. As a result, the probability of sensational events inflates while the probability of mundane events shrinks.

Another of the shortcuts is to only consider what is visible and extrapolate from there (all there is is what you see): only the visible information is considered, without even considering that something could be missing from the picture. Somebody looking nice will be considered a nice person, unless additional negative information about him is given.

The brain tries so hard to build a story that it will see patterns even in random data. It will infer causality very quickly, and with very little. As Kahneman puts it, the brain is “a machine for jumping to conclusion”.

It important to remember the dinstinction between plausible and probablewhen it comes to judgment, because we’re taking decisions all day long.

You hear a project was using a new methodology and was very successful with it so you want to use it as well? Beware the survivorship bias. You don’t know how many projects used the methodology and failed…

You think the biggest risk in your project is a distributed attack from China? Beware the availability heuristic. Your biggest risk might be to not have proper input validation…

You see bug reports for your teams and start detecting a pattern? Beware the law of small number. Your sample size might be too small…

Our brain is hardwired to tell us stories. So it’s very hard to improve on our handling of probabilities. Often, we’re simply not aware that probabilities are at play. And even when we are, it’s really hard to change our instinct in some cases: if you flip 9 times a coin and  got 9 times head, the probability to get tail on the 10th flip is higher, right? Well, actually not. But it’s hard to not feel otherwise.

So, the best remedy is to default to a healthy skepticism and accept that the outcome of many situations in life is simply the result of chance. It might sound like fatalism but it isn’t at all. Your actions will influence the outcome of many situations, but you don’t know how. Don’t buy the first plausible explanation your brain tries to sell you, it’s probably not the right one.

References 

Software Architecture

Things You Can’t Abstract

The art of programming is to a large extent the art of devising abstractions. Some might be very general and reusable in many contexts, some will be more specialized and applicable only in some domains.

The purpose of abstraction is to hide complexity so that we don’t need to care about details. Using abstractions, we can “raise the abstraction level”.

Data structures, relational databases, file systems or garbage collection are all examples of common programming abstractions. There are of course many more.

Abstracting is not unique to programming. For instance, the DNA, the cell, the organ and the organism are different abstraction levels in biology.

An abstraction defines a contract between a user and a provider. The less constraints there are in the contract the more freedom there is in the implementation possibilities. It’s tempting to abstract away all non functional aspects, but it’s actually a bad idea: you will need to understand them to use the abstraction correctly.

First, you can not abstract performance. Wether an operation takes O(1) or O(n) is not something you can ignore. Eventually, at some point you will have to care about the implementation of the abstraction to understand its performance characteristics. Abstracting performance and letting the runtime figure out the best optimization strategy look nice on paper but is the source of many headache. You will need to know how your data structure performs, how your database fetches data, how many files can reasonably exist in a folder, and when your garbage collection kicks in.

Second, you can not abstract failure modes. If something can fail, you can not ignore it. This is especially true of the network: if something is remote it can be inaccessible. Attempts to abstract the network as if everything were local simply do not work. An abstraction can have few failure modes, but there is no abstraction that never fails. You will need to understand how your data structure reacts when it can’t expand, how you database reacts when your commit is so big that its transaction log is full, when your file system is not reachable, and when the garbage collector can’t reclaim space.

And third, you can not abstract the consumption of shared resources. Since shared resources are finite and common to the whole system, every component is indirectly related to every other components. You will need to understand how much memory you data structure takes, how much of your data fit in the database cache, how much disk space your system consumes, how much clock cycles are eaten by garbage collection runs.

That makes a lot of aspects we can’t abstract. Joel Spolsky was right. All non-trivial abstractions eventually “leak“. Barbara Liskov was wrong. In practice, two abstractions with the same functionality cannot be “substituted“, unless they also have the same non-functional characteristics.

It is discouraging to realize we can’t abstract as much as we want, but it doesn’t mean abstraction doesn’t work. You will need to know a bit more about data structure, database, file systems, and garbage collection than you thought to use them correctly, but you can still ignore a lot of the internal details. The goal of hiding some complexity is achieved, but not of hiding all complexity.

More