No More QA

Companies have traditionally organized software-related activites in three silos: Dev, Test/QA, Operations.

The QA effort is realized after a long phase of development resulting in bug spikes and difficulties to plan the work for the development teams during this time.

When companies were engineering software “piecewise” this was the only way. Only when all pieces were finished could you integrate them and test features end-to-end. We’ve however now moved to an approach where products and teams are organized so that features can be delivered end-to-end incrementally. The whole product is engineered iteratively.

Evidences suggest that a centralized QA phase does not bring additional quality in this case, but rather actively harm quality.

As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into “test” as quickly as they could.

There is no such thing as a devops team, Jez Humble

A similar story is explained in The Age of Agile about implementing agile organization at Microsoft.

There was a lot of learning at the start of the Agile transformation at Microsoft. “In the first sprints,“ says Bjork, “there was agreement on doing three-week sprints. The leadership signed off on the idea of Agile, but they were anxious as to how it was going to work. They planned for ‘a stabilization sprint’ after five sprints. However, that encouraged some teams to think, ‘No need to worry about bugs, because we have the stabilization sprint!’ A lot of bugs were generated and all the teams had to pitch in to help fix them.

“in effect,“ he says, “we had told people to do one thing, but we created an environment that prompted some teams to do the opposite. Who could blame them? The teams told us. ‘Don’t ever do that to us again!’ It was an example of unintended consequences.”

The Age of Agile, Stephen Denning,

For once, fixing the problem is easy. Just get rid of you QA phase (not the testers!). Make it clear that there is no additional safety net and that teams must ship features that are “done, done, done.”

Autonomy and Microservices

Discussions about monolith vs microservice are hotter than ever. Usually, a monolith is synonym for “big ball of mud” in these discussions. It of course needn’t be so. A modular monolith is perfectly possible. Also, microservices isn’t an entirely new idea either. As some says, it’s SOA done right.

The usual argument in favor of microservices is that autonomy is a good thing: teams can pick the best appropriate tools, develop in parallel without friction, and scale services independently of each other. The main drawback is an increased complexity of the overall system, primary on the operations side but also on the tools side.

The usual argument in favor of a modular monolith is that it’s simple: the code base can be modularised to enable parallel development, the tech stack is standardized for everyone which reduces complexity. The main drawback is that the release cycle is the same for everyone which implies some coordination and possibly reduces the release cadence. The risk of inadvertent coupling is also higher since modularisation boundaries are internal and not external as with microservices.

The distinction microservices vs monolith is a continuum though. You can for instance have microservices with a standardized tech stack or a distributed monolith with the ability to scale some parts independently.

It’s up to you to decide which levels of autonomy you want.

Autonomy
Benefits Perils
Internal quality standards
  • Better fitness of design principles, coding conventions, or testing strategies to the problem domain
  • Increased productivity
  • Code and people “mobility” is weakened
  • Adherence to conventions is weakened because there are many of them
  • Best practices keep being reinvented; each team goes through the same path of failure and lesson learned.
  • Best-practices in place turn up to be sub-optimal.
Scaling
  • Scaling of individual parts of the system
  • Elasticity
  • Performance of the system harder to comprehend
  • Overall operations gets harder
Techstack
  • Better fitness of the technologies to the problem domain
  • Increased productivity
  • Code and people “mobility” is limited
  • Strategy for long term support of technologies is harder
  • More fragility to changes of licence models
  • No economy of scale for lifecycle activities; everybody must do its own lifecycle
Release cycle
  • Shorter Time-to-Market
  • Shorter Feedback loops
  • Versioning hell

The mindset that lead to large monoliths is a mindset rooted in economy of scale. Development, testing, database and operations work is organised in silos. The idea is that the effort is reduced if the product is large and infrequently released. You do things once, at large scale, with specialists.

With microservices, the effort for a microservice is small enough that one cross functional team can undertake development, testing, database and operations work all by itself. There is less economy of scale but also less coordination needed.

“Because you can doesn’t mean you should.” Deviations from established practices or technologies can have attractive payoffs, but also come with some risk. Teams with lots of autonomy should be aware of the long term consequences of their choice and balance them against short-term benefits.

Services need complete teams when they are actively developed. With the time, some modules will stabilize and their maintenance concentrated to fewer teams. Inversely, services might grow and require splitting in multiple teams. In either case, teams ownership might change over time. If the technologies are very heterogeneous this might be more challenging.

Ultimately how much autonomy you want to give to the team is an organizational choice, not a technical choice. If you trust your organisation to be able to work with autonomous teams yet converge toward shared goals, microservices might work for you. If the organization maturity isn’t there, don’t go for microservices: you’ve translated your technical issues into people issues, which are even harder to solve.

Links

Do You Need an Architect?

Architects do typically three things: they own, they coordinate, and they mentor.

As an owner, the architect maintains the integrity of the system at a high level. He designs the foundations, identifies tradeoffs, decides on essential changes.

As a coordinator, the architect facilitates work and optimizes the exchange of information. He connects people, gather information, and plan activities.

As a mentor, the architect provides the intellectual background to understand the system, work autonomously, and improve. He explains concepts and rationale, teaches best practices, and suggests improvements.

It’s a people and technical job.

Which kind of architect you need depend on the project and the team. If the team has enough expertise, they don’t need a mentor. If the team goes well along, they don’t need a coordinator. If the team shares the same view of the system, they can own it collectively.

So maybe you don’t need an architect.

The distinction between architecture and engineering is anyway very blurry. An architect doesn’t do something fundamentally different than an engineer. The three traits exist in every team member. Architects are simply mentoring, coordinating and owning at a different level of scale and responsabilty. Some companies (like Google and Amazon) don’t have architects. They only have engineers with different levels of seniority.

And if you think that coding vs. not coding is a fundamental difference in the job, it’s not. Both architects and engineers are doing software design.

The more happens organically through self-organisation in the team, the better. But self-organisation is hard and it frequently fails. If mentoring, coordination or ownership do not happen as they should, you’re in trouble. Identifying clear responsabilities might help.

So maybe you will need an architect after all.

MORE

Conceptual Integrity at Scale

The central argument of the Mythical Man Month from Fred Brooks is that conceptual integrity is the most important consideration in system design, and that conceptual integrity will only be achieved if the design comes from one, or a few resonant minds.

I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

[…]

Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.

If you’ve been the creative force in a group work, you will have experienced these challenges. Core ideas are misunderstood, insoncistencies start to pop up, and the result is a patchwork.

For my part, I can confirm that consistency erodes quickly if you don’t pay close attention. Maintaining conceptual integriy is hard work.

This doesn’t happen because people are dumb, neglecting or malevolent. It happens because as soon as you specialize, you lose sight of the whole. Someone does a change here, someone a change there, and both changes end up not being fully consistent with each other.

Unfortunaltely, unlike Brooks suggests, doing all the design work alone is usually not realisitc.

With a good review culture you can scale your design team from one head to a few: let people design parts of the system even if their understanding of the whole system is lacunary, and have one central person review how well the contributions fit it.

It’s like having mutliple authors for an article but having one person in charge of doing a complete pass on the article at the end to ensure consistency.

But if you want to tackle bigger challenges, you will have to scale your design team even more.

Ensuring conceptual integrity at scale is hard because it requires not only scaling knowledge but also standardizing the decision making process.

This is what guidelines try to achieve. Guidelines encode the principles, maxims, constraints, and goals of the system in a way that different people reach similar decisions. It’s evidently impossible to encode the complete decision making process in guidelines, given that so much subjective, but they help achieve a basic overall consistency.

As for the subjectivity: just take one of your colleague and ask yourself “what would he decide?” You might have a hunch at his decision, but chances are, you don’t know enough about all the thinking that went in his previous decisions to predict this one accurately. If you do, well, you’re two “reasonant” minds, as Brooks would say.

If you know lots of people will be involved in the design process, you will need more than guidelines and reviews. You will have to decompose the problem in parts that can be solved individually. Each part can be assigned one “mind”. The whole might not be fully consistent, but the solution at each level of abstraction will at least be consistent.

Following the newspaper analogy, a newspapers has an editor in chief that sets the tone of the writings and the overall orientation (these are guidelines). He or she will review the topics of the individual articles to make sure they fit in the issue of the newspaper, but he or she won’t edit every article himself (the parts).

No large system will be fully consistent (think of Microsoft Office, that our dear journalists might be using), but it doesn’t hurt too much, because no user will ever use all of the system.

Evolution will also bring some inconsistencies in the system. Moving from one system paradigm to the next is like moving from one local maxima to the next one. In between things will be worse, that is, less consistent. But if you think there’s a superior design paradigm for the whole system, it’s worth challenging the current one and see if there’s a path.

Fred Brooks is right that conceptual integrity is the most important aspect in system design. He’s also right that the more designers there are, the harder it is to ensure concistency. But for large systems that evolve, some inconsistencies are inevitable. Address them like other risks in your project.

In Defense of Design Before Coding

Software design as a separate activity from implementation — “up front” design — got a bad press with agile methods.

Agile advocates say the design should be emergent. They say, design without coding is waterfall. It’s a waste of time.

I understand that you don’t want to design the whole system up front. But at the feature level, a bit of thinking before coding does miracles, I say.

My first argument is visible design. Looking at the code doesn’t reveal the whole design because code only shows the static structure. The design is more than that. To understand how the system works you must run it, but even then the sequencing of events is still invisible. If you want effective feedback on the design, you must make it visible. People that jump directly to code still end up sketching or drawing things for their colleagues to explain their design and get feedback. Designing up front makes the design visible up front.

My second argument is speed of iteration. Even with higher-level programming languages, there is a gap between the concepts and the implementation. There is some work needed to implement the thing for real and take care of all the details. Running the system in your your head, or on paper, to challenge the design enables faster iterations on the design. CRC sessions are for instance a nice way to explore the design space effectively, without coding yet.

My third argument is better reasoning. The code level is just one of the many abstraction levels you can use to reason about the system. When you’re trying to identify the main abstractions, what their responsibilities are, and how they play together, this abstraction level is often too low. One abstraction with a clear responsibility might map to several classes. There might be for instance a “scheduler” that will be implemented using several concurrency primitives. These are implementation details (althrough interesting ones!) irrelevant for now. Working at the code level forces you to think at one specific abstraction level. Working on paper enables you to choose the optimal abstraction level to work out the design.

And finally, my fourth argument is tracking rationale. The code defines how the system works but gives in itself little clues as to why it was designed so. Design is all about trade offs: what were the they? If you never learn to design up front on paper you will never learn to document software understandably, too. And without documentation later, the rationale will be lost.

You should design up front so far you can. Then switch to a computer.

How far you can design up front depends on your intellect and your knowledge of the problem domain. Learn to assess the confidence in your up front design correctly, and identify when to stop, since there lies the danger: too much time spent designing on paper something that doesn’t work. But some design up front has its place.

More

How Technology Evolves

We often take for granted the technology we have and forget that it’s the result of a tedious evolutionary process.

A Railroad Track is the Width of Two Horses is one of the first stories about the evolution of technology that I remember reading, maybe ten years ago. It rings more like a colorful story than a true historic account, but it nevertheless left an impression on me.

Later, doing research gave me a bette appreciation how of ideas evolve, cross-polinate and morph over time. True hindsights are rare. It’s a lot about tweaking existing ideas until the right form that works is found.

Here are some of the most engaging stories about technology history that I’ve read:

Oh boy, innovation is so a messy process.

Platforms and Innovation

I started my career writing flash applications. Then I moved to Java. Both are middleware technologies that abstract the underlying operating system and enable cross-platform interoperability. I’ve actually never wrote a professional application that relied directly on a specific operating system.

This was fine to me. “Write once, run everywhere” was great for productivity.

For the kind of applications I was developing, what these middleware stacks provided was enough. Maybe I occasionally wished that drag and drop between the application and its host system was better supported, but that’s it more or less. I didn’t really miss a deeper integration with the rest of the system.

These technologies were also innovative on their own. Flash enabled developers to create rich web applications back in a time when web sites were mostly static. The same was true of Java and its applets, even if the technology never really took off.

But middleware technologies also slow down innovation.

An operating system provider wants developers to adopt its new functionalities as quickly as possible, to innovate and make the platform attractive. Middleware technologies make such adoption harder and slower.

The official Apple memo “Thoughts on Flash” about not supporting Flash on iOS makes it very clear:

We know from painful experience that letting a third party layer of software come between the platform and the developer ultimately results in sub-standard apps and hinders the enhancement and progress of the platform.

The informal post “What really happened with Vista” gives similar arguments against middleware stacks:

Applications built on [cross-platform] middleware tend to target “lowest common denominator” functionality and are slower to take advantage of new OS capabilities.

For desktop applications, a good integration with the operating system was a plus, but not a killer. The drag and drop functionality I occasionally missed didn’t impact the whole user experience.

With mobile devices, everything is different.

Mobile applications are more focused and need to integrate on the device seamlessly–in terms of user experience, but also connectivity and power consumption. That’s what “Thoughts on Flash” was about.

Think of notifications. Notifications for desktop applications are nice, but not a killer. For a mobile application, how the application integrates with notifications makes the difference between success and failure. Notifications are becoming the heart of the smartphone experience. You don’t want there to suck.

Or think of ARKit, Apple’s upcoming augmented reality toolkit. Augmented reality hasn’t yet really hit the mass market and there is lots of potential there. If only, it will make our good old fashion ruler obsolete to measure distances. But such a toolkit relies on specific hardware (sensor, CPU, camera). You don’t want middleware there to slow down adoption.

Platforms diverge and sometimes converge. They diverge when exclusive capabilities are added and converge when a cross platform standard is adopted.

With HTML5 we have a good standard for regular applications with desktop-like features. The GMail mobile web application is for instance so well done, that I prefer it to the native iOS version. But you can only go that far with HTML5. If you want to push the envelope, you need to go native and use the full power of the platform.

For applications in the broader context of the digitalization (social media, artificial intelligence, internet of things) innovation at the platform level will be decisive.

The platform war will intensify.

More

10 Tips to Fail with Enterprise Integration

If you want to make enterprise integration needlessly complicated, follow these tips.

1. Model poorly

A poor model is always a nice way to make things more complicated than they should.

Examples: You can name thing badly. You can model everyting as strings (key, list, etc.). Or you can reuse overly generic abstractions in multiple contexts instead of defining one abstraction per context. Or you can expose a relational model instead of an entity model.

2. Use immature technologies

Whenever possible, use immature, non-standard, or inappropriate technologies to make the integration complicated.

Example: Don’t use XML but JSON. Its support in IDE is still weak, its semantics for the various numeric types is poor, it prevents proper code generation (for class-based language), and JSON-Schema is still a draft.

3. Assume the network is perfect

Assume the network is perfect. It has infinite bandwidth as well as zero latency. This is a classic for disaster. Ignore completely the reality of networking. If your interface is sound at the logical level, then it will be fine in production.

Examples: Don’t distinguish between the time of the event you model and the technical time when the message was sent or received–it doesn’t matter since latency is zero. Or send replies to individual requests on a topic and leave the burden of filtering out the irrelevant replies to the subscriber at the application level–it doesn’t matter since bandwith is infinite.

4. Make loads and updates asymmetric

It is common for an interface to publish updates on topics but also provide a mean for the consumer to load data upon startup. In such case, the system should work so that the same data are delivered to the consumer for loads and updates. To introduce subtle data inconsistencies, make it so that loads and updates don’t deliver the same data.

Example: If an entity has multiple status, do not publish all status changes per updates. This way, there is a discrepance between the data you obtain per load requests and per updates.

5. Make the system as stateful as possible

If you find a way to complicate state management, go for it.

Examples: Instead of publishing entities that are consistent, publish only the delta with what has changed. The consumer must carefully ensure that all deltas are applied in order. Or define requests that reference other requests, e.g. to implement paging. The provider will need to do some bookkeeping of the previous requests.

6. Leave the protocol vague

By defining the transport technology, the encoding, and the various messages that can go through your interface, most readers of the specification will have a good understanding of what the purpose of the interface is. So stop there. Don’t bother explaining the exact protocol with the assumptions about the order of messages or when a given message can be sent or not. This way, you leave the door open to non obvious misunderstandings.

Example: don’t specificy which requests can be used anytime and which should be used only occasionally after a restart or recovery.

7. Don’t properly version your interface

Your interface will need to change. Don’t provide proper versioning. This way, supporting multiple versions will be a pain.

Example: Use XML Namespaces, but don’t use it for versioning.

8. Redefine the semantics of data between versions

Do subtle changes to the meaning of the data, so that the semantics changes in a non obvious way.

Example: Redefine what “null” means for a certain attribute.

9. Don’t distinguish between endpoint and tenant

Your interface will be accessible through an endpoint that will probably be used from multiple consumer systems (“tenant”). Define SLA per endpoint, but not per tenant. This way you will need to deploy multiple endpoints to really guarantee SLA for specific consumers.

Example: provide a limit for the frequency of load requests at the endpoint-level, but independent of the consumer systems. If a consumer misbehaves, it will prevent all other consumers from loading data.

10. Ignore monitoring needs

Do not provide any meaningful way for the consumer to check whether the provider is healthy or not. Either the consumer will have to guess, or it will have to use feature not designed for monitoring to assess the system health.

Example: aggregate data from multiple decentralized subsystems and publish them via a centralized interface, but don’t provide any way for the consumer to figure out which subsystem is healthy or not.

More

Why a Calendar App is a Great Design Exercise

To check if a salesman is good, one classic is the “Sell me this pen” test. To check if a software designer is good, I propose the “Design me a calendar app” test.

That was one of the topic we chose for the software engineering lab, and I loved the results.

There are several reasons why it works well as a design exercise:

Everybody can relate − The domain is easily understood and everybody can relate. Who hasn’t used a calendar app?

It’s easy but not so easy − Managing events that occur once and are short is easy. But it gets more interesting as soon as events are recurring (series), span multiple days, are entered in different time zones, or have rooms associated with them. The design becomes more complex not because independent features pile up, but because the complexity of the core model increases.

Time is messy − A lot of complexity in business software comes from the fact that business rules are “arbitrary”. They make sense at the business level because of processes, domain knowledge, etc. but it’s hard to capture some clear “logic” behind them in software. Introducing such a business domain for an exercise is possible, but takes time. On the other hand, every body knows the idiosyncrasies of the Gregorian calendar already. There is little “logic” behind February having only 28 days and occasionally 29. But, yes, it means a month might sometimes overlap exactly 4 weeks and not 5. Deal with it in you UI.

It’s not just the server − This design exercise raises interesting questions not only in the backend, but also the frontend. What’s the right model? How can we display it fast? What’s the expectation of the user when the start time of a meeting is changed: to shorten it or to move it? These questions don’t have to do with the technology stack. They are inherent to the product. For questions like the last one, I recommend reading The Math of Easy-to-Use from Terry Crowley, former head of development for Microsoft Office, including Microsoft Outlook. He knows about calendar apps.

In the The Mythical Man Month, Fred Brooks explains that one of his favorite interview question is “Where is next November?”.

I have long enjoyed asking candidate programmers, “Where is next November?” If the question is too cryptic, then, “Tell me about your mental model of the calendar.” The really good programmers have strong spatial senses; they usually have geometric models of time; and they quite often understand the first question without elaboration.

Mental models of time are cultural. In western societies, time flows from left to right; the past is behind us and the future in front of us. In other societies, it’s the way around. So I wouldn’t quite expect a specific answer to this question. But I would agree with Fred Brooks that a good ability to model time is a predictor of good design skills in general.

If you don’t get much from this exercise, it will at least make you more aware of the problems that exist dealing with time in computer programs and to use libraries properly. This is a valuable programming skill on its own. The system I’m working on (a train dispatching system) doesn’t work correctly during the night of the daylight saving time (DST) change in autumn, since time jumps back if the DST offsets aren’t accounted for. If you’ve designed a calendar app once in your life, you are aware of such pitfalls.

So, please, don’t design todo apps as exercise. Design calendar apps. It develops real design skills and will make real-world software less buggy.

More

Living in the Future

The world is constantly changing. From electricity to cars to television to the internet, most generations have seen at least one breakthrough.

This will continue, and it’s certain that my generation will witness another technological shift.

Interestingly, how we react to new technologies changes itself with time.  For a lot of new technologies, my first reaction was indifference, missing entirely the new possibilities the technology offered.

The iPhone? I thought it would be a flop. Facebook? I thought it would be a fad. Bitcoin? I thought it would crash.

It seems like I belong to the late majority rather than the early adopters. Maybe Douglas Adams has also a point:

I’ve come up with a set of rules that describe our reactions to technologies:

1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.

2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.

3. Anything invented after you’re thirty-five is against the natural order of things.

Since I’m certain to witness another change, I will have to adapt, whether I like it or not.

For instance, virtual reality might be a thing, after all. It seems to me very against the natural order of things right now, but actually it’s not much crazy than television back then.

First versions of new technologies always sucked. They were bulky, limited, slow, made just usable enough for a specific niche market. For virtual reality helmets, the gamers.

With widespread adoption, the usage can completely change, though. I’m writing this post on an iPhone using a third party app, after all. Maybe virtual reality is the future of shopping, who knows.

The talent is to foresee the potential of a mass market, which isn’t always obvious.

I think there is a world market for maybe five computers — Thomas Watson, 1943

Realizing that my ability to predict successful technology changes are as good as Thomas Watson, it’s interesting to try to see how innovators see the world.

According to Paul Graham, innovators “live in the future.” They are natural early adopters and their use of technology is so that they simply build what is missing to them.

An alternate formulation which I like is from Tim Urban: innovators have an accurate “reality box.” That is, unlike most people, whose understanding of the world and what technology enable reflects the common wisdom established 10 years ago, the innovator has an accurate and up-to-date understanding of the possibilities offered by technology. This make it obvious to create new products around these capabilities.

Will virtual reality turn out to be the future of shopping, or self driving cars become mainstream, or bitcoin establish itself as a the first digital currency? Whatever the next breakthrough will be, there’s an exiting time ahead.

So I’ve decided to be more open to new ideas and keep my reality box more accurate to assess them. But changing one’s way of reacting to new ideas is hard, just as well as predicting the future.

Wearing a smart watch is still something that doesn’t appeal to me. And it apparently doesn’t appeal to many other people either.

More