ewernli

Talk: High-available applications for rail control (@BATBern43)

Marc Hoffmann and I were invited to the 43th Berner Architekten Treffen (BATBern) on “Event-Driven Architectures”. We presented the architecture of the Rail Control System, especially the mechanisms for high-availability.

Here are the slides:

batbern43 Rail Control System: hochverfügbare Anwendungen für die Bahnproduktion from BATbern

The Inevitability of Superintelligence

If we assume that the brain is a kind of computer, artificial intelligence is the process of reproducing its functioning. Based on this hypothesis, it’s easy to dismiss the possibibility of above-human intelligence by arguing that we can only program what we understand, which would means the intelligence in the machine is bounded by our own. But it’s also very easy to refute this limitation by arguing that we encode learning processes in the machine. These learning processes would be working at a scale and speed that we can’t match. The machine will beat us.

This later argument definitively seems to hold if we look at recent achievements in deep learning. Computer achieve some tasks that very much ressemble some form of intelligence. Looking more carefully, it’s however questionable whether we should speak of intelligence or simply knowledge. Techniques like deep learning enable computers to learn facts based on large amounts of data. These facts might be very sophisticated, ranging from recoloring images correctly to impersonating the artistic style of a painters. But the computer isn’t intelligent because no reasoning really happen.

This leads actually to an interesting question about intelligence. How much of intelligence is simply about predicting things based on experiences? If an object fall, you predict its position in the future to catch it, based on other experiences with falling objects. If someone asks you a “what’s up?”, you can predict that they expect to learn about what’s going on. With GPT-3, which works according to this principles, you can almost have a conversation. I say almost, because we also see the limit the approach. There are some classes of question that don’t work, like basic arithmetic.

Current artificial intelligence is able to learn, either by analysing large quantities of data (deep learning) or simulating an environment and learning what works and what doesn’t (reinforcement learning). But we’re still far from sentient, thinking machines.

If we assume that our brain is some kind of computer performing a computation, there’s however nothing that prevent us from replicating it. With this line of thought, it’s only a matter of time until we “crack” the nature the intelligence and will find the right way to express this computation. When this breakthrough will happen is unknown – maybe in a decade, maye much later – but there’s nothing that make it impossible. With sufficient perserverence, this breakthrough is inevitable.

Speaking in terms of computation and data, a system becoming smarter can happen in two ways. The first one, is what we have now. Systems that learn over time through the accumulation of data. The computation remains the same though. A deep learning network is programmed once (by human!) and than trained on large quantities of data to adjust its paramter. But maybe a second class of systems exists: system that self-improve by changing their computation. Systems able to inspect and change themselves do exist and are called reflective systems. In such a system, data can be turned in computation and computation into data. The system can thus modify itself.

Some people believe that with artificial intelligence, we risk beeing outsmarted by a “explosition” of intelligence. Systems of the first class learn within the bounds of the computation that defines them – however complex this computation is. The possibility of an explosion is limited. With systems of the second classes, we’re free to speculate, including the possibility of an explosion of intelligence. Such a system could outsmart us and lead to superintelligence.

If we assume that our brain is a computation: is it self-improving or not? Children acquire novel cognitive capabilities over time, which at least give the illusion of self-improvement. But maybe these learnings are only very complex form of data accumulation. Also, the boundary between reflective and non-reflective systems is not black and white. A fully reflective system can change any aspects of its computation, whereas a non-reflective system processes input data according to fixed rules that never changes. A system that is able to infer and defines some rules for itself would fall in between both categories: the rules can change, but to an extent that is limited to some aspect of the computation. The adaptive nature of neural networks could, in some way, be seen as a limited form of rule changing: the rules are fixed, but the “weight” given to them change over time due to feedback loops.

Learning requires data provided by an environment. We’re able to learn only because we interact with the world and other people. If we were to replicate the computation in our brain and the learning process that takes place, we would also need to simulate the environement. The computational complexity of all this is probably enourmous. Maybe we can replicate the computation in our brain, but not the environment, or only limited forms of it. In which case, it’s hard to tell what kind of intelligence could be achieved.

Depending on the computation and environment that we simulate, the intelligence won’t resemble human intelligence much. The algorithm of AlphaGo learns in an enviornment that only consists of the Go rules. We can not even image what this world would be like. Assuming that the artificial intelligence is human-like misjudges the nature of human intelligence. Intelligence is not one quantity that we can weight based on clear criterium. Intelligence has many facets and is contextual.

For some facets, like arithmetics, machines are for sure already superintelligent.

More

Great Articles on Software Engineering

Sometimes, I read an article, and some idea deeply resonates with me and makes a long lasting impression. It changes the way I approach some topic.

Fred Brooks’ essay “no silver bullet” was one of the very first article I read that had this effect. The concepts of esssential and accidental complexity are very powerfull, deeply resonate with me, and shaped the way I see software engineering. This essay is a classic because it had the same effect on many people.

But there are other many great articles that influenced me. Let’s recap some of them:

No silver bullet, Fred Brooks

A software system consists of essential and accidental (or implementation) complexity. We should reduce accidental complexity as much as possible, but essential complexity will still be the dominng factor.

The law of leaky abstractions, Joel spolsky

It’s very hard to devise abstractions that completely hide the underlying complexity. Often, you will need to understand some internal details no matter what.

Simple Made Easy, Rich Hickey

This is a great talk about complexity. The key takeaway is that simplicity comes from not mixing things together that shouldn’t. It’s independent of your prior knowledge. Easiness comes from habits and convention. It depends on prior knowledge.

Choose Boring Technology, Dan McKinley

A reminder that using shiny new tools isn’t always the best option and that established and mature tools has its place if they suffice to get the job done.

There is no now, Justing Sheehy

An exploration of the way to handle time in distributed systems, where there’s no global notion of time or consistency.

Beating the Average, Paul Graham

A classic from Paul Graham where he described how using Lisp and macros gave the company an advantage over their competitors.

Life Beyond Distributed Transaction, Pat Helland

An articles about giving up distributed transactions to design internet-scale systems using simpler data models (e.g. key-value stores)

Everything You Know About Latency Is Wrong, Tyler Treat

In short: using average or percentile hides your outliers, which is an important signal to understand the real beahvior of the system.

A Note on Distributed Computing, S. Kendall et al.

An article explaining that trying to abstract remote boundaries is bound to fail.

Smalltalk: A Reflective Language, F. Rivard

A very nice explanation of Smalltalk and its reflective capabilities showing how to adapt the language to add pre/post conditions. The reification of the stack and the fact that the debugger is just a normal tool is also explained.

Reuse: is the dream dead?, Kirk Knoernschild

An exploration of the use/reuse paradox: “Maximizing reuse complicates use”

Reflection on Trusting Trust, Ken Thompson

A wicked experiment on bootstraping

The Log: What Every Software Engineer Should Know About Real-time Data’s Unifying Abstraction, Jay Kreps

A fanstatic analysis of the distributed log as the basic building block to integrate real-time systems. So good, that it was later converted in a book: I love logs.

Most of these authors have written several other articles that are great as well.

Exploring Team Structures and Responsibilities

A main responsibility of management is to make sure that teams function well. This implies defining adequate organisational structures and finding adequate people.

The Boss Team

Sometimes, the structure is easy. There’s a boss and there are employees. The boss decides how the team should function, what should be done, who should do it, and is finally the one evaluating the employees according to his/her expectations. The employees can be involved in all these aspects, but the boss has the final authority. This model has sometimes a bad press, because if the boss sucks, it’s hell.

Scrum Team

There are other ways to organise teams. In scrum for instance, the product owner decides what should be done. The development team decides which member does what, and how. How the team works is partly given by the scrum framework itself. But the team can also adapt its way of working through retrospectives. Performance evaluation is not part of scrum and there’s often a line manager, not working closely witht the team, in charge of it. There’s less concentration of power on a “boss”.

Self-Organising Team

Following the decentralisation of power, we lend on so called “self-organising” teams. Given a high-level mission, the team is in charge to figure out how to work (“working on the system, not in the system”), what to do, and who does what. Performance evaluation is still outside the team. In any group a power structure will establish itself. In this case, it’s informal.

In any of the above models, there can be more or less specialisation in the team members. If all employees of a boss team have roughly the same skills, tasks can be distributed arbitrary. If a high specialisation exists, who does what is fixed in advance. How specialisation happens is a side-effect of how the team functions. It can be actively encourage (e.g. T-Shirt Skills) or discouraged. It can be formal (e.g. through job title) or informal.

Staffing is usually outside of the power of the team, except for a boss team. But like other budget issue, it could become part of the team responsabilities.

Recap:

	Boss Team	Scrum Team	Self-Organising Team
How the team works	Boss	Scrum + Team (Retro)	Team
What to do	Boss	Product Owner	Team
Who does what	Boss	Team	Team
How to do it	Team	Team	Team
Performance Evaluation	Boss	Line Manager	Line Manager
Staffing	Boss	Line Manager	Line Manager

Needless to say, this isn’t an exhaustive listing of all possible structures we find in organisations. In a complex organisation you might have “what do do” actually beeing split between a project manager, a business analyst, and some high-level managment. Or “how to do it” can be split between someone specialised as architect and the rest of the team.

There might also be certain processes to follow, e.g. for compliance or governance reasons.

Each time you have more than one item in a cell in the table, it means that either responsibilities are diluted or there is some specialisation going on. A bit of both might be justified, like a review process for some decision or having experts in some domain. Dilution of responsibilities and specialisation can go hand in hand if the expert acts as reviewer. But having too many items in the cells is probably a sign of organizational dysfunction.

The number of items in the cells reflects the complexity of the organization, a bit like function points reflects the complexity of a feature. I don’t know what’s an acceptable complexity here. As a rough number for a threshold, I would say there should be no more than twelve points across the column.

More

Academia vs. Industry

A post-doc student contacted me to know more about the difference between academia and industry (in software engineering). That was an interesting converstation. I tried to summarize the main points as I see them:

Day to day work – In academia, you are more alone and focused on one thing for a long streak of time. In industry, chances are that you will work either in a team, working on the stories in your sprint, or outside of a team, in the role of a faciliator/coordinator, switching between many smaller tasks.

Collaboration – In academia, your main contact is your supervisor, then the peers in your group and some peers in some external groups you collaborate with. In industry, your main peers are your teammates, plus all the external stakeholder you must occasionnaly synchronize with.

Skills – In academia, you focus on conceptional work and data analysis. The engineering part comes second (the code of my prototypes wouldn’t pass a code review). In industry, engineering comes first. Software must have a high quality if it’s developped collaboratively. Conceptual work and data analysis of course also happen in industry, just not all the time (except of course, if you’re a data scientist!). There are plenty of conceptional challenges in industry, but the engineering part will take even a more significant bulk of the time than in academia. In both case, academia and industry require a high attention to details—but other details.

Writing – In academia, part of your job is to sell your ideas in the form of papers. This takes a significant time. In industry, writing is less important, unless you work in a role like facilitator/coordinator. Writing skills are still an asset in industry because you will occasionally have to sell you ideas to your collegue too, and writing can help.

Gratification – In academia, results come after many month of work on prototypes or data gathering. The work might get a few citations and resonate with a few people, but the impact is usually limited, except of course, if you’re very tallented and are lucky to do a breakthrough somewhere. Research brings fruits in the long term, through the accumulation of contributions that enrich the body of knowledge. In industry, the result is more immediate. Broadly, you can work either on software products, where your users are end-users, or plattforms, where your users are other teams. Stories and features are shipped regularly and users will use them. Significant impact for your company can be achieved with ambitious projects over a few years. Impact reaching beyond your company is quite rare, like breakthrough in research.

Engagement – In academia, you pick your topics yourself, so this tend to engages you. On the other hand, if you are stuck with your research and don’t progress much, it might be boring if not depressing. In industry, you have less autonomy to do whaterever you want, but in a good company, everybody should have it’s share of challenge and stay engaged. Software products have different phases: incubating, expanding, maintaining, or decommissioning software all have their challanges. A good company takes care to match the expertise of the team to the challange at hand so that people feel they learn something. There are also many types of software, from application developped solo over a couple of weeks, to multi-year, multi-teams systems. I like large-scale software but some prefer smaller systems where you can iterate faster.

Growth – In academia, growing mostly means getting tenure, which requires a substantial committment over a long time and a bit of luck. In industry, you can reach senior positions relatively quickly if you’re talented. In both cases you can switch course over time, say to reoriented your research interests or technologies/industries that you focus on. Many companies provide dual ladders that can help you grow on the technical side or on the management side. Nowadays ther are many opportunities for leadership besides the traditional “manager” role.

Drive

drive The key message of “Drive“, from Daniel H. Pink, is that people are best engaged in a task when they understand the purpose of the job, they can develop the necessary mastery to realise the job, and they have sufficient autonomy to direct things their way.

That’s a very simple framework — purpose, mastery, autonomy — and I very much like it.

The problem with the book is that beside this simple message, there’s little more. The book presents scientific studies to backup the ideas and tries to trace the evolution of these ideas. It’s entertaining but very annectodical. The second half of the book has a “toolkit” to reflect on motivation, summaries of the book, a glossary, and even conversation starters. In themselves, these are creative ideas. But it did feel a bit like milking the content.

One idea that is somehow developped in the book and that differ from the main framework is the difference between intrinsic and extrasinc motivation. Instinsic motivation is way stronger. If the activity, like playing a game, is the reward itself, there’s no need of an external reward. Or maybe the activity is not the reward itself, but we understand the reward ourselves, without external inputs. We know how to kill intrisinc motivation (pay somebody to do what he would do for free), but how to foster it is very challenging.

The question of how to develop engagement is actually a very intersting one. It’s not only interesting in the context of the workplace, but also in education, or social community.

This book has a nice message. I just wished it had more depth.

Mastering Technology

Things move fast in the IT industry. Half of the technologies I use today didn’t exist ten years ago. There’s a constant push to switch to new technologies. But when it comes to technologies, I’m kind of conservative. I like proven technologies.

The thing that makes me conservative is that mastering a technology takes a lot longer that we think.

Most advocates of new technologies massively underestimated the learning curve. Sure, you get something working quickly. But truely understanding a new technology takes years.

Take object-oriented programming. On the surface it’s easy to grasp quickly and become productive. But it took the industry something like 20 years to figure out that inheritance isn’t such a great idea. The result is that early object-oriented systems overused inheritance hoping it will favor reuse, whereas it just led to hard mainteance.

The same holds for instance for the idea of distributed objects. It’s easy to grasp and appealing . Yet it took the industry decades to realize that abstracting remote boundaries is a flawed idea. We instead need to explicitly embrace and expose asynchronous API.

Another one of my favorite easy-to-grasp-but-hard-to-master technology is object-relational mappers (e.g. Hibernate). 10 years of experience and I’m still struggling with it as soon as the mapping isn’t entirely trivial.

Want another example? XA Transaction. Updating a database row and sending a message seems to be the poster child for XA Transactions. Well, it turns out that this simple scenario is already problematic. When it didn’t work I learned that I was experiencing the classic “XA 2-PC race-condition“.

There are lots of new technologies being developed right now, like distributed log systems, container schedulers, web framework. I perfectly understand why they are being developed and what problems they supposedly solve.

But don’t try to convince me that they are silver bullets. Every technology choice is a trade-off because you can never fully abstract the underlying complexity away. There’s a price somewhere. Things will break in unexpected way and nobody knows why. Performance will be hard to figure out. Subtle misuse of the technology will only be detected later and be hard to correct. It will take time to figure these things out.

At the end it’s all about risk management. If the technology might provide a strategic advantage, we can talk. The investment might be worth it. But if it’s not even strategic, I would seriously challenge if the risk of using new technologies is worth it.

More

The Age of Agile

ageofagile This book is about agile management as a way to conduct business — not just software development.

In the first chapters, the author presents agile management by characterizing it with three “laws”: the law of the customer, the law of the small teams, and the law of the network. Roughly, I would sum them up as: companies should put creating value to customer at the center of all activities, they should embrace decentralisation and autonomous teams that course-correct, and they should embrace fluid communication throughout the organisation as well as leverage network effects.

The key thesis of the book is that agile management is a fundamental shift from traditional management and is thus akin to a revolution. Whereas traditional management is bureaucratic, top-down, short-term-focused, cost-oriented, and seeks to defend existing innovation to stay competitive, agile management is collaborative, decentralized, long-term-oriented, and seeks to create new innovations to stay competitive.

The laws capture many ideas. The concept of iteration is addressed in the law of small teams and I was wondering if it would have benefited a “law of feedback” on its own. The law of network is also covering two things: fluid communication and network effects with platforms like Amazon Marketplace, the Apple App Store, AWS. I also wondered if this idea should have been better extracted in a “law of platform”. But keeping the message simple with three laws make the conceptual framework easier.

After the chapters about the laws of agile management come a few chapters about implementing agile management in the form of a couple experience reports. These are mostly chapters of the form “doing X worked for us”. Each chapter draw its material from one primary source. It makes the ideas presented earlier a bit more concrete but weren’t that memorable.

Finally come a few chapters where Denning returns to the law of customer, this time in the form of a discussion of the shareholder value model and its shortcomings. The shareholder value model, in the words of Denning, leads to financial engineering and short-term cost oriented management strategies which do no create new customer value, but rather exploit the existing customer value. The chapters read mostly like a rant, which I addmittly enjoyed, just like I enjoy a good rant from Steve Yegge. But in the context of the book, it felt a bit too one-sided or simply too long.

This book was for me like watching a science-fiction movie with a good idea but lots of plot holes. I liked the framework with the three laws, which presents agile principles in a new way and gives new insights. I also liked that Denning, who doesn’t come from the tech world but form the business world, is ambitious about agile management and sees it at a macro scale through the lense of economy theory and business strategy, not just at the operational level. The conclusion, for that matter, is a good summary of his view. On the other hand, the chapters were lacking a clear connection. There are grandious annoucements about the promise of Agile and these new ideas we should embrace, or return to, but sometimes the content lacks substance. The emphasis on creating customer value felt right though. It’s very much in the spirit of Amazon’s principle of “customer-obession” or Y-Combinators’s motto “make something people want”. I will remember the book for this emphasis.

Related

http://allankelly.blogspot.com/2015/10/large-companies-and-fast-cars.html?m=1
What makes big companies efficient is what prevent them to be agile: size
https://steveblank.com/2016/10/24/why-tim-cook-is-steve-ballmer-and-why-he-still-has-his-job-at-apple/amp/
The difference between operational executinves vs. disruptive visionaries
https://medium.com/@steve.yegge/jeff-bezos-jack-ma-and-the-quest-to-kill-ebay-bb4992dc5020
The Story of Amazon Auctions, zShop and finally Amazon Marketplace, How to win over network effects, The philosophy of innovation of Bezos vs Jobs vs Schmidt.
https://hbr.org/2018/05/agile-at-scale

some hints at how large org have applied agile, but quite vague, much like the book.

No More QA

Companies have traditionally organized software-related activites in three silos: Dev, Test/QA, Operations.

The QA effort is realized after a long phase of development resulting in bug spikes and difficulties to plan the work for the development teams during this time.

When companies were engineering software “piecewise” this was the only way. Only when all pieces were finished could you integrate them and test features end-to-end. We’ve however now moved to an approach where products and teams are organized so that features can be delivered end-to-end incrementally. The whole product is engineered iteratively.

Evidences suggest that a centralized QA phase does not bring additional quality in this case, but rather actively harm quality.

As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into “test” as quickly as they could.

— There is no such thing as a devops team, Jez Humble

A similar story is explained in The Age of Agile about implementing agile organization at Microsoft.

There was a lot of learning at the start of the Agile transformation at Microsoft. “In the first sprints,“ says Bjork, “there was agreement on doing three-week sprints. The leadership signed off on the idea of Agile, but they were anxious as to how it was going to work. They planned for ‘a stabilization sprint’ after ﬁve sprints. However, that encouraged some teams to think, ‘No need to worry about bugs, because we have the stabilization sprint!’ A lot of bugs were generated and all the teams had to pitch in to help ﬁx them.

“in effect,“ he says, “we had told people to do one thing, but we created an environment that prompted some teams to do the opposite. Who could blame them? The teams told us. ‘Don’t ever do that to us again!’ It was an example of unintended consequences.”

— The Age of Agile, Stephen Denning,

For once, fixing the problem is easy. Just get rid of you QA phase (not the testers!). Make it clear that there is no additional safety net and that teams must ship features that are “done, done, done.”

Autonomy and Microservices

Discussions about monolith vs microservice are hotter than ever. Usually, a monolith is synonym for “big ball of mud” in these discussions. It of course needn’t be so. A modular monolith is perfectly possible. Also, microservices isn’t an entirely new idea either. As some says, it’s SOA done right.

The usual argument in favor of microservices is that autonomy is a good thing: teams can pick the best appropriate tools, develop in parallel without friction, and scale services independently of each other. The main drawback is an increased complexity of the overall system, primary on the operations side but also on the tools side.

The usual argument in favor of a modular monolith is that it’s simple: the code base can be modularised to enable parallel development, the tech stack is standardized for everyone which reduces complexity. The main drawback is that the release cycle is the same for everyone which implies some coordination and possibly reduces the release cadence. The risk of inadvertent coupling is also higher since modularisation boundaries are internal and not external as with microservices.

The distinction microservices vs monolith is a continuum though. You can for instance have microservices with a standardized tech stack or a distributed monolith with the ability to scale some parts independently.

It’s up to you to decide which levels of autonomy you want.

Autonomy	Benefits	Perils
Internal quality standards	Better fitness of design principles, coding conventions, or testing strategies to the problem domain Increased productivity	Code and people “mobility” is weakened Adherence to conventions is weakened because there are many of them Best practices keep being reinvented; each team goes through the same path of failure and lesson learned. Best-practices in place turn up to be sub-optimal.
Scaling	Scaling of individual parts of the system Elasticity	Performance of the system harder to comprehend Overall operations gets harder
Techstack	Better fitness of the technologies to the problem domain Increased productivity	Code and people “mobility” is limited Strategy for long term support of technologies is harder More fragility to changes of licence models No economy of scale for lifecycle activities; everybody must do its own lifecycle
Release cycle	Shorter Time-to-Market Shorter Feedback loops	Versioning hell

The mindset that lead to large monoliths is a mindset rooted in economy of scale. Development, testing, database and operations work is organised in silos. The idea is that the effort is reduced if the product is large and infrequently released. You do things once, at large scale, with specialists.

With microservices, the effort for a microservice is small enough that one cross functional team can undertake development, testing, database and operations work all by itself. There is less economy of scale but also less coordination needed.

“Because you can doesn’t mean you should.” Deviations from established practices or technologies can have attractive payoffs, but also come with some risk. Teams with lots of autonomy should be aware of the long term consequences of their choice and balance them against short-term benefits.

Services need complete teams when they are actively developed. With the time, some modules will stabilize and their maintenance concentrated to fewer teams. Inversely, services might grow and require splitting in multiple teams. In either case, teams ownership might change over time. If the technologies are very heterogeneous this might be more challenging.

Ultimately how much autonomy you want to give to the team is an organizational choice, not a technical choice. If you trust your organisation to be able to work with autonomous teams yet converge toward shared goals, microservices might work for you. If the organization maturity isn’t there, don’t go for microservices: you’ve translated your technical issues into people issues, which are even harder to solve.

Links