Mastering Technology

Things move fast in the IT industry. Half of the technologies I use today didn’t exist ten years ago. There’s a constant push to switch to new technologies.  But when it comes to technologies, I’m kind of conservative. I like proven technologies.

The thing that makes me conservative is that mastering a technology takes a lot longer that we think.

Most advocates of new technologies massively underestimated the learning curve. Sure, you get something working quickly. But truely understanding a new technology takes years.

Take object-oriented programming. On the surface it’s easy to grasp quickly and become productive. But it took the industry something like 20 years to figure out that inheritance isn’t such a great idea. The result is that early object-oriented systems overused inheritance hoping it will favor reuse, whereas it just led to hard mainteance.

The same holds for instance for the idea of distributed objects. It’s easy to grasp and appealing . Yet it took the industry decades to realize that abstracting remote boundaries is a flawed idea. We instead need to explicitly embrace and expose asynchronous API.

Another one of my favorite easy-to-grasp-but-hard-to-master technology is object-relational mappers (e.g. Hibernate). 10 years of experience and I’m still struggling with it as soon as the mapping isn’t entirely trivial.

Want another example? XA Transaction. Updating a database row and sending a message seems to be the poster child for XA Transactions. Well, it turns out that this simple scenario is already problematic. When it didn’t work I learned that I was experiencing the classic “XA 2-PC race-condition“.

There are lots of new technologies being developed right now, like distributed log systems, container schedulers, web framework. I perfectly understand why they are being developed and what problems they supposedly solve.

But don’t try to convince me that they are silver bullets. Every technology choice is a trade-off because you can never fully abstract the underlying complexity away. There’s a price somewhere. Things will break in unexpected way and nobody knows why. Performance will be hard to figure out. Subtle misuse of the technology will only be detected later and be hard to correct. It will take time to figure these things out.

At the end it’s all about risk management. If the technology might provide a strategic advantage, we can talk. The investment might be worth it. But if it’s not even strategic, I would seriously challenge if the risk of using new technologies is worth it.


  • Organisation

    The Age of Agile

    ageofagileThis book is about agile management as a way to conduct  business — not just software development.

    In the first chapters, the author presents agile management by characterizing it with three “laws”: the law of the customer, the law of the small teams, and the law of the network. Roughly, I would sum them up as: companies should put creating value to customer at the center of all activities, they should embrace decentralisation and autonomous teams that course-correct, and they should embrace fluid communication throughout the organisation as well as leverage network effects.

    The key thesis of the book is that agile management is a fundamental shift from traditional management and is thus akin to a revolution. Whereas traditional management is bureaucratic, top-down, short-term-focused, cost-oriented, and seeks to defend existing innovation to stay competitive, agile management is collaborative, decentralized, long-term-oriented, and seeks to create new innovations to stay competitive.

    The laws capture many ideas. The concept of iteration is addressed in the law of small teams and I was wondering if it would have benefited a “law of feedback” on its own. The law of network is also covering two things: fluid communication and network effects with platforms like Amazon Marketplace, the Apple App Store, AWS. I also wondered if this idea should have been better extracted in a “law of platform”. But keeping the message simple with three laws make the conceptual framework easier.

    After the chapters about the laws of agile management come a few chapters about implementing agile management in the form of a couple experience reports. These are mostly chapters of the form “doing X worked for us”. Each chapter draw its material from one primary source. It makes the ideas presented earlier a bit more concrete but weren’t that memorable.

    Finally come a few chapters where Denning returns to the law of customer, this time in the form of a discussion of the shareholder value model and its shortcomings. The shareholder value model, in the words of Denning, leads to financial engineering and short-term cost oriented management strategies which do no create new customer value, but rather exploit the existing customer value. The chapters read mostly like a rant, which I addmittly enjoyed, just like I enjoy a good rant from Steve Yegge. But in the context of the book, it felt a bit too one-sided or simply too long.

    This book was for me like watching a science-fiction movie with a good idea but lots of plot holes. I liked the framework with the three laws, which presents agile principles in a new way and gives new insights.  I also liked that Denning, who doesn’t come from the tech world but form the business world, is ambitious about agile management and sees it at a macro scale through the lense of economy theory and business strategy, not just at the operational level. The conclusion, for that matter, is a good summary of his view. On the other hand, the chapters were lacking a clear connection. There are grandious annoucements about the promise of Agile and these new ideas we should embrace, or return to, but sometimes the content lacks substance. The emphasis on creating customer value felt right though. It’s very much in the spirit of Amazon’s principle of “customer-obession” or Y-Combinators’s motto “make something people want”. I will remember the book for this emphasis.



    No More QA

    Companies have traditionally organized software-related activites in three silos: Dev, Test/QA, Operations.

    The QA effort is realized after a long phase of development resulting in bug spikes and difficulties to plan the work for the development teams during this time.

    When companies were engineering software “piecewise” this was the only way. Only when all pieces were finished could you integrate them and test features end-to-end. We’ve however now moved to an approach where products and teams are organized so that features can be delivered end-to-end incrementally. The whole product is engineered iteratively.

    Evidences suggest that a centralized QA phase does not bring additional quality in this case, but rather actively harm quality.

    As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into “test” as quickly as they could.

    There is no such thing as a devops team, Jez Humble

    A similar story is explained in The Age of Agile about implementing agile organization at Microsoft.

    There was a lot of learning at the start of the Agile transformation at Microsoft. “In the first sprints,“ says Bjork, “there was agreement on doing three-week sprints. The leadership signed off on the idea of Agile, but they were anxious as to how it was going to work. They planned for ‘a stabilization sprint’ after five sprints. However, that encouraged some teams to think, ‘No need to worry about bugs, because we have the stabilization sprint!’ A lot of bugs were generated and all the teams had to pitch in to help fix them.

    “in effect,“ he says, “we had told people to do one thing, but we created an environment that prompted some teams to do the opposite. Who could blame them? The teams told us. ‘Don’t ever do that to us again!’ It was an example of unintended consequences.”

    The Age of Agile, Stephen Denning,

    For once, fixing the problem is easy. Just get rid of you QA phase (not the testers!). Make it clear that there is no additional safety net and that teams must ship features that are “done, done, done.”


    Autonomy and Microservices

    Discussions about monolith vs microservice are hotter than ever. Usually, a monolith is synonym for “big ball of mud” in these discussions. It of course needn’t be so. A modular monolith is perfectly possible. Also, microservices isn’t an entirely new idea either. As some says, it’s SOA done right.

    The usual argument in favor of microservices is that autonomy is a good thing: teams can pick the best appropriate tools, develop in parallel without friction, and scale services independently of each other. The main drawback is an increased complexity of the overall system, primary on the operations side but also on the tools side.

    The usual argument in favor of a modular monolith is that it’s simple: the code base can be modularised to enable parallel development, the tech stack is standardized for everyone which reduces complexity. The main drawback is that the release cycle is the same for everyone which implies some coordination and possibly reduces the release cadence. The risk of inadvertent coupling is also higher since modularisation boundaries are internal and not external as with microservices.

    The distinction microservices vs monolith is a continuum though. You can for instance have microservices with a standardized tech stack or a distributed monolith with the ability to scale some parts independently.

    It’s up to you to decide which levels of autonomy you want.

    Benefits Perils
    Internal quality standards
    • Better fitness of design principles, coding conventions, or testing strategies to the problem domain
    • Increased productivity
    • Code and people “mobility” is weakened
    • Adherence to conventions is weakened because there are many of them
    • Best practices keep being reinvented; each team goes through the same path of failure and lesson learned.
    • Best-practices in place turn up to be sub-optimal.
    • Scaling of individual parts of the system
    • Elasticity
    • Performance of the system harder to comprehend
    • Overall operations gets harder
    • Better fitness of the technologies to the problem domain
    • Increased productivity
    • Code and people “mobility” is limited
    • Strategy for long term support of technologies is harder
    • More fragility to changes of licence models
    • No economy of scale for lifecycle activities; everybody must do its own lifecycle
    Release cycle
    • Shorter Time-to-Market
    • Shorter Feedback loops
    • Versioning hell

    The mindset that lead to large monoliths is a mindset rooted in economy of scale. Development, testing, database and operations work is organised in silos. The idea is that the effort is reduced if the product is large and infrequently released. You do things once, at large scale, with specialists.

    With microservices, the effort for a microservice is small enough that one cross functional team can undertake development, testing, database and operations work all by itself. There is less economy of scale but also less coordination needed.

    “Because you can doesn’t mean you should.” Deviations from established practices or technologies can have attractive payoffs, but also come with some risk. Teams with lots of autonomy should be aware of the long term consequences of their choice and balance them against short-term benefits.

    Services need complete teams when they are actively developed. With the time, some modules will stabilize and their maintenance concentrated to fewer teams. Inversely, services might grow and require splitting in multiple teams. In either case, teams ownership might change over time. If the technologies are very heterogeneous this might be more challenging.

    Ultimately how much autonomy you want to give to the team is an organizational choice, not a technical choice. If you trust your organisation to be able to work with autonomous teams yet converge toward shared goals, microservices might work for you. If the organization maturity isn’t there, don’t go for microservices: you’ve translated your technical issues into people issues, which are even harder to solve.


    Software Architecture

    Do You Need an Architect?

    Architects do typically three things: they own, they coordinate, and they mentor.

    As an owner, the architect maintains the integrity of the system at a high level. He designs the foundations, identifies tradeoffs, decides on essential changes.

    As a coordinator, the architect facilitates work and optimizes the exchange of information. He connects people, gather information, and plan activities.

    As a mentor, the architect provides the intellectual background to understand the system, work autonomously, and improve. He explains concepts and rationale, teaches best practices, and suggests improvements.

    It’s a people and technical job.

    Which kind of architect you need depend on the project and the team. If the team has enough expertise, they don’t need a mentor. If the team goes well along, they don’t need a coordinator. If the team shares the same view of the system, they can own it collectively.

    So maybe you don’t need an architect.

    The distinction between architecture and engineering is anyway very blurry. An architect doesn’t do something fundamentally different than an engineer. The three traits exist in every team member. Architects are simply mentoring, coordinating and owning at a different level of scale and responsabilty. Some companies (like Google and Amazon) don’t have architects. They only have engineers with different levels of seniority.

    And if you think that coding vs. not coding is a fundamental difference in the job, it’s not. Both architects and engineers are doing software design.

    The more happens organically through self-organisation in the team, the better. But self-organisation is hard and it frequently fails. If mentoring, coordination or ownership do not happen as they should, you’re in trouble. Identifying clear responsabilities might help.

    So maybe you will need an architect after all.


    Software Architecture

    Conceptual Integrity at Scale

    The central argument of the Mythical Man Month from Fred Brooks is that conceptual integrity is the most important consideration in system design, and that conceptual integrity will only be achieved if the design comes from one, or a few resonant minds.

    I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.


    Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.

    If you’ve been the creative force in a group work, you will have experienced these challenges. Core ideas are misunderstood, insoncistencies start to pop up, and the result is a patchwork.

    For my part, I can confirm that consistency erodes quickly if you don’t pay close attention. Maintaining conceptual integriy is hard work.

    This doesn’t happen because people are dumb, neglecting or malevolent. It happens because as soon as you specialize, you lose sight of the whole. Someone does a change here, someone a change there, and both changes end up not being fully consistent with each other.

    Unfortunaltely, unlike Brooks suggests, doing all the design work alone is usually not realisitc.

    With a good review culture you can scale your design team from one head to a few: let people design parts of the system even if their understanding of the whole system is lacunary, and have one central person review how well the contributions fit it.

    It’s like having mutliple authors for an article but having one person in charge of doing a complete pass on the article at the end to ensure consistency.

    But if you want to tackle bigger challenges, you will have to scale your design team even more.

    Ensuring conceptual integrity at scale is hard because it requires not only scaling knowledge but also standardizing the decision making process.

    This is what guidelines try to achieve. Guidelines encode the principles, maxims, constraints, and goals of the system in a way that different people reach similar decisions. It’s evidently impossible to encode the complete decision making process in guidelines, given that so much subjective, but they help achieve a basic overall consistency.

    As for the subjectivity: just take one of your colleague and ask yourself “what would he decide?” You might have a hunch at his decision, but chances are, you don’t know enough about all the thinking that went in his previous decisions to predict this one accurately. If you do, well, you’re two “reasonant” minds, as Brooks would say.

    If you know lots of people will be involved in the design process, you will need more than guidelines and reviews. You will have to decompose the problem in parts that can be solved individually. Each part can be assigned one “mind”. The whole might not be fully consistent, but the solution at each level of abstraction will at least be consistent.

    Following the newspaper analogy, a newspapers has an editor in chief that sets the tone of the writings and the overall orientation (these are guidelines). He or she will review the topics of the individual articles to make sure they fit in the issue of the newspaper, but he or she won’t edit every article himself (the parts).

    No large system will be fully consistent (think of Microsoft Office, that our dear journalists might be using), but it doesn’t hurt too much, because no user will ever use all of the system.

    Evolution will also bring some inconsistencies in the system. Moving from one system paradigm to the next is like moving from one local maxima to the next one. In between things will be worse, that is, less consistent. But if you think there’s a superior design paradigm for the whole system, it’s worth challenging the current one and see if there’s a path.

    Fred Brooks is right that conceptual integrity is the most important aspect in system design. He’s also right that the more designers there are, the harder it is to ensure concistency. But for large systems that evolve, some inconsistencies are inevitable. Address them like other risks in your project.

    Software Architecture

    In Defense of Design Before Coding

    Software design as a separate activity from implementation — “up front” design — got a bad press with agile methods.

    Agile advocates say the design should be emergent. They say, design without coding is waterfall. It’s a waste of time.

    I understand that you don’t want to design the whole system up front. But at the feature level, a bit of thinking before coding does miracles, I say.

    My first argument is visible design. Looking at the code doesn’t reveal the whole design because code only shows the static structure. The design is more than that. To understand how the system works you must run it, but even then the sequencing of events is still invisible. If you want effective feedback on the design, you must make it visible. People that jump directly to code still end up sketching or drawing things for their colleagues to explain their design and get feedback. Designing up front makes the design visible up front.

    My second argument is speed of iteration. Even with higher-level programming languages, there is a gap between the concepts and the implementation. There is some work needed to implement the thing for real and take care of all the details. Running the system in your your head, or on paper, to challenge the design enables faster iterations on the design. CRC sessions are for instance a nice way to explore the design space effectively, without coding yet.

    My third argument is better reasoning. The code level is just one of the many abstraction levels you can use to reason about the system. When you’re trying to identify the main abstractions, what their responsibilities are, and how they play together, this abstraction level is often too low. One abstraction with a clear responsibility might map to several classes. There might be for instance a “scheduler” that will be implemented using several concurrency primitives. These are implementation details (althrough interesting ones!) irrelevant for now. Working at the code level forces you to think at one specific abstraction level. Working on paper enables you to choose the optimal abstraction level to work out the design.

    And finally, my fourth argument is tracking rationale. The code defines how the system works but gives in itself little clues as to why it was designed so. Design is all about trade offs: what were the they? If you never learn to design up front on paper you will never learn to document software understandably, too. And without documentation later, the rationale will be lost.

    You should design up front so far you can. Then switch to a computer.

    How far you can design up front depends on your intellect and your knowledge of the problem domain. Learn to assess the confidence in your up front design correctly, and identify when to stop, since there lies the danger: too much time spent designing on paper something that doesn’t work. But some design up front has its place.



    How Technology Evolves

    We often take for granted the technology we have and forget that it’s the result of a tedious evolutionary process.

    A Railroad Track is the Width of Two Horses is one of the first stories about the evolution of technology that I remember reading, maybe ten years ago. It rings more like a colorful story than a true historic account, but it nevertheless left an impression on me.

    Later, doing research gave me a bette appreciation how of ideas evolve, cross-polinate and morph over time. True hindsights are rare. It’s a lot about tweaking existing ideas until the right form that works is found.

    Here are some of the most engaging stories about technology history that I’ve read:

    Oh boy, innovation is so a messy process.


    Platforms and Innovation

    I started my career writing flash applications. Then I moved to Java. Both are middleware technologies that abstract the underlying operating system and enable cross-platform interoperability. I’ve actually never wrote a professional application that relied directly on a specific operating system.

    This was fine to me. “Write once, run everywhere” was great for productivity.

    For the kind of applications I was developing, what these middleware stacks provided was enough. Maybe I occasionally wished that drag and drop between the application and its host system was better supported, but that’s it more or less. I didn’t really miss a deeper integration with the rest of the system.

    These technologies were also innovative on their own. Flash enabled developers to create rich web applications back in a time when web sites were mostly static. The same was true of Java and its applets, even if the technology never really took off.

    But middleware technologies also slow down innovation.

    An operating system provider wants developers to adopt its new functionalities as quickly as possible, to innovate and make the platform attractive. Middleware technologies make such adoption harder and slower.

    The official Apple memo “Thoughts on Flash” about not supporting Flash on iOS makes it very clear:

    We know from painful experience that letting a third party layer of software come between the platform and the developer ultimately results in sub-standard apps and hinders the enhancement and progress of the platform.

    The informal post “What really happened with Vista” gives similar arguments against middleware stacks:

    Applications built on [cross-platform] middleware tend to target “lowest common denominator” functionality and are slower to take advantage of new OS capabilities.

    For desktop applications, a good integration with the operating system was a plus, but not a killer. The drag and drop functionality I occasionally missed didn’t impact the whole user experience.

    With mobile devices, everything is different.

    Mobile applications are more focused and need to integrate on the device seamlessly–in terms of user experience, but also connectivity and power consumption. That’s what “Thoughts on Flash” was about.

    Think of notifications. Notifications for desktop applications are nice, but not a killer. For a mobile application, how the application integrates with notifications makes the difference between success and failure. Notifications are becoming the heart of the smartphone experience. You don’t want there to suck.

    Or think of ARKit, Apple’s upcoming augmented reality toolkit. Augmented reality hasn’t yet really hit the mass market and there is lots of potential there. If only, it will make our good old fashion ruler obsolete to measure distances. But such a toolkit relies on specific hardware (sensor, CPU, camera). You don’t want middleware there to slow down adoption.

    Platforms diverge and sometimes converge. They diverge when exclusive capabilities are added and converge when a cross platform standard is adopted.

    With HTML5 we have a good standard for regular applications with desktop-like features. The GMail mobile web application is for instance so well done, that I prefer it to the native iOS version. But you can only go that far with HTML5. If you want to push the envelope, you need to go native and use the full power of the platform.

    For applications in the broader context of the digitalization (social media, artificial intelligence, internet of things) innovation at the platform level will be decisive.

    The platform war will intensify.



    10 Tips to Fail with Enterprise Integration

    If you want to make enterprise integration needlessly complicated, follow these tips.

    1. Model poorly

    A poor model is always a nice way to make things more complicated than they should.

    Examples: You can name thing badly. You can model everyting as strings (key, list, etc.). Or you can reuse overly generic abstractions in multiple contexts instead of defining one abstraction per context. Or you can expose a relational model instead of an entity model.

    2. Use immature technologies

    Whenever possible, use immature, non-standard, or inappropriate technologies to make the integration complicated.

    Example: Don’t use XML but JSON. Its support in IDE is still weak, its semantics for the various numeric types is poor, it prevents proper code generation (for class-based language), and JSON-Schema is still a draft.

    3. Assume the network is perfect

    Assume the network is perfect. It has infinite bandwidth as well as zero latency. This is a classic for disaster. Ignore completely the reality of networking. If your interface is sound at the logical level, then it will be fine in production.

    Examples: Don’t distinguish between the time of the event you model and the technical time when the message was sent or received–it doesn’t matter since latency is zero. Or send replies to individual requests on a topic and leave the burden of filtering out the irrelevant replies to the subscriber at the application level–it doesn’t matter since bandwith is infinite.

    4. Make loads and updates asymmetric

    It is common for an interface to publish updates on topics but also provide a mean for the consumer to load data upon startup. In such case, the system should work so that the same data are delivered to the consumer for loads and updates. To introduce subtle data inconsistencies, make it so that loads and updates don’t deliver the same data.

    Example: If an entity has multiple status, do not publish all status changes per updates. This way, there is a discrepance between the data you obtain per load requests and per updates.

    5. Make the system as stateful as possible

    If you find a way to complicate state management, go for it.

    Examples: Instead of publishing entities that are consistent, publish only the delta with what has changed. The consumer must carefully ensure that all deltas are applied in order. Or define requests that reference other requests, e.g. to implement paging. The provider will need to do some bookkeeping of the previous requests.

    6. Leave the protocol vague

    By defining the transport technology, the encoding, and the various messages that can go through your interface, most readers of the specification will have a good understanding of what the purpose of the interface is. So stop there. Don’t bother explaining the exact protocol with the assumptions about the order of messages or when a given message can be sent or not. This way, you leave the door open to non obvious misunderstandings.

    Example: don’t specificy which requests can be used anytime and which should be used only occasionally after a restart or recovery.

    7. Don’t properly version your interface

    Your interface will need to change. Don’t provide proper versioning. This way, supporting multiple versions will be a pain.

    Example: Use XML Namespaces, but don’t use it for versioning.

    8. Redefine the semantics of data between versions

    Do subtle changes to the meaning of the data, so that the semantics changes in a non obvious way.

    Example: Redefine what “null” means for a certain attribute.

    9. Don’t distinguish between endpoint and tenant

    Your interface will be accessible through an endpoint that will probably be used from multiple consumer systems (“tenant”). Define SLA per endpoint, but not per tenant. This way you will need to deploy multiple endpoints to really guarantee SLA for specific consumers.

    Example: provide a limit for the frequency of load requests at the endpoint-level, but independent of the consumer systems. If a consumer misbehaves, it will prevent all other consumers from loading data.

    10. Ignore monitoring needs

    Do not provide any meaningful way for the consumer to check whether the provider is healthy or not. Either the consumer will have to guess, or it will have to use feature not designed for monitoring to assess the system health.

    Example: aggregate data from multiple decentralized subsystems and publish them via a centralized interface, but don’t provide any way for the consumer to figure out which subsystem is healthy or not.