OOP: past, present, future

Object-Oriented Programming (OOP) has been a mainstream programming paradigm since about 40 years now. That’s quite a bit of time. So it’s worth asking: how did the paradigm evolve over time? I would say, looking back, that there has been 3 eras.

Era 1: Modelling the world with objects (1980-1995)

The idea with the object-oriented paradigm is to model the world in objects. The poster child of the object paradigm from this era is Smalltalk, where everything is an object. Objects send messages to each other and live in a parmanent, persisted state. This approach is great to model stand-alone applications with GUI components. The problem with this approach is that it doesn’t work very well for busines entities. In Smalltalk everything is an object and everyting is persistent. In other language, regular programs are started and stopped. You only want to persist the business entites. Persisting a subset of the objects is possible with object databases. This is a challenging problem though, similar to the serialization of object graphs. Searching and navigating heaps of objects is also no so easy. There’s also no easy way to share the business entities across instances of the application. For these reasons, business applications have frequently relied on relational database to persist their state.

Era 2: Objects and enterprise applications (1995-2005)

Using relational database to persist object graph leads to the so called “object-relational impedance mismatch“. It manifest itself in the difficulty to have a rich domain model that is persistent at the same time. The simplest approach to reduce the mismatch is to have a simpler domain model – just data structure – that can be persisted easily. But this in turns means that you move towards a more procedural style of programing again. This style of programming is well capture in the pattern “anemic domain model” of Martin Fowler.

The Java Enterpise Plattform is a major technology of this era. The Java Enterprise Plattform embraced object-orientation with the concept of enterprise java beans (EJB). Prio to EJB3, entity beans and session beans where objects that were persisted by the application server itself, with a vague ressemblance to an object database. Every operations would be carried as a synchronous operation over the network. The technology proved however to be hard to use because of the associated network costs. Another mismatch. Starting with EJB3, entity beans were turned into regular objects persisted with an object-relational mapper.

Other approaches to object-orientation exists. Domain-driven design promotes rich domain model in accordance with the concept of “modelling the world as objects”. To solve the object-relational mismatch, the domain model is kept separate from the model used for persistence. So called repositories take care of dealing with the mismatch.

The actor pardigm can be seen as a special form of object-orientation where objects communicate asynchronously. Actors are stateful domain entites. This avoids the problem of networking but doesn’t provide out of the box a solution for persistence. Some way to solve it is through event sourcing of object serialisation.

The heavy use of inheritance is also a characteristic of this era. Object-orientation promised reuse, which we thought meant inheritance. This missed the point. Reuse is promoted with good interfaces, which doesn’t stricly needs inheritance. With time, we learned to use interfaces, inheritance, composition and parametric polymorphism (aka generics) in a sane way.

With the previous learnings, the use of object orientation stabilized to a form mixing object-oriented data structures (think lists, data transfer objects, etc.) and object-oriented components (think of a business service, an HTTP server, or framework). This isn’t the revolution promised in the first era, but it makes good uses of objects and encapsulation to improve upon procedural programming.

Era 3: Objects and functional programming (2005-now)

Scala started exploring object-orientation and functional programming more in detail around 2005. Both paradigms might look contradictory at first (object-oriented programming is about mutability, functional programming about immutability). But both blend in actually quite well, at least the basics like list transformations (think map and flatMap). This isn’t actually a big suprise, given that lambda has been there from the beginning in Smalltalk, it’s just that Java didn’t have them initially.

This exploration continued with Kotlin and with Java itself. Java finally added lambda to the language and there are many more explorations going in with incubating projects. For instance pattern matching and OO play along quite well too. Developpers found an appropriate balance between immutable constructs and mutable ones.

What we have now, is what we can call “FP in the small, OO in the large”. Objects shine at encaspulating whole components or services. The object-oriented data structures that are used internally don’t need necessary to be mutable, though. They can be transformed and manipulated using idioms from functional programming.

It’s I think we’re we stand now, and where we will stay for a few more years until we’ve fully explored this space.

More

Metaphors in Software Engineering

One metaphor frequently used in the field of software development is the metaphor of software architecture. The architecture of a software system consists, like the architecture of building,  of the main structures of the systems.

For a software system, the term “structure” could mean structures that are logical or conceptual. They don’t necessary match with tangible system boundaries. But the term “structure” does mean that the metaphor is biased towards expressing the static aspects of the software system.

Unlike buildings, software systems have also dynamic aspects. Information flows in a software system, and systems communicate with each others. Therefore, other metaphors can be useful to explain the nature of software systems.

Here are a few that I find interesting.

A city

As said, the architecture metaphor is limited in that if focuses too much on the statics. The city metaphor is in this regard better, since it evoques simulateously static structures (roads, bridges, buildings) but also dynamic aspects (traffic flow, people living in the city). Good city planning deals with both. The metaphor can be used for a software system, but also for collections of software systems.

Enterprise architecture is the field of IT that addresses IT strategy at the enterprise level. The city metaphor is a good own for the enterprise architectture. Changes of IT strategy (for instance, moving to the cloud) impact many systems and take years to be achieved. They significantly and durably change the way software system are built for the enterprise. If Hausmann’s renovations gave a new face to Paris, moving to the cloud will give a new face to the IT of your enterprise. 

A garden

The architecture metaphor is also limited in that it conveys the impression that a software is built once, and then never changes. It may be true for a building, but isn’t for software systems. According to the laws of software evolution, a software system must constantly be maintained and adapted to the needs of their users, or it will become useless. As software systems are developped and grow, they tend to accumulate inconsistencies that must be actively removed. This is much like a garden, which must be constantly maintained, and bad weeds, which must be removed. 

It’s possible to convey something similar with the architecture metaphor too, since building suffer wear and tear. We speak sometimes of architecture erosion, to denote the degrading quality of the architecture. By the way, buildings do change over time, sometimes quite significantly.

A book

Software is expressed using programming languages and its source code consists of text. A software system can thus be compared to a book, albeit a very special one. You can’t read it linearly and everything is interlinked. But there is a sense of style in a given code base, and code can be more or less elegant. There is something arful to programming. Given that developpers spend a lot more time reading code than writing code, taking care of software as text makes sense. With development approaches like literate programming, developpers a supposed to write the source code like a story to explain their thoughts. It didn’t catch on, but still worth a look.

A living organism

A running software system can also be compared to a living organism: it needs energy to run and do something useful. In some way, functions of the runtime, like memory management or thread scheduling, can been seen as some form of metabolism. Interestingly, some software systems like blockchains are explicitly designed to have an inefficient metabolism and consume large amount of energy. A running software system has a health too, which indicates how well the system works. Millions of things can go wrong during run time, degrading its health and behavior. For instance a memory leak will over time degrade the performance of the system until it simply dies.  Some components of a software system have at run time multiple instances. A failure of one component doesn’t break the whole system, just like we can live with one kidney. A running software systems can be compromised by a hostile inputs, the equivalent of a pathogen. The immune system of a running software consists of mechanisms like SQL sanitization, managed memory, safe pointers, etc. which aim at making software more robust. Usually software systems do not reproduce, though. Except for software viruses.

An asset

The IT has long been seen as a cost center, detached from business units that are profit centers. With digitalization, the perception is changing. Software is the enabler for the business, and go hand in hand with it. It is an asset and generates value. But with software, more code doesn’t mean more returns. More code means more maintenance, and only some feature of the system might actually deliver value.

There are of course more metaphors. Just have a look at the links below. The city, the garden, and the book metaphor are somewhat popular. The metaphor with living organisms is surprisingly uncommon. The asset metaphor isn’t really a metaphor- more like a mindset. The architecture metaphor is sometimes critiqued, but if we assume that software development is an eingeering discipline, it’s the only metaphor that resonates with engineers. So it’s unlikely to change.

More

Chasing the Perfect Technology

The goal of pretty much any framework/plattform that you use — from a PaaS offering to application server and everything in between — is to make you more productive by taking care of some technical complexity for you: “Focus on the business logic, not the technology”.

Frameworks and platforms speed up development so that you can ship faster. And it’s true that you can ship faster: You can now, with current technologies, build an internet-scale service, highly available, able to handles millions of transactions per seconds, in a few month. It would have been unimaginable one decade ago.

The peak of productivity is achieved when you master your stack completely. You can then spend significant time working on business feature with little friction around technology itself.

Sadly, the peak of productivity is rarely reached.

One of the reasons is that developers get bored too early. Once the technical groundwork is in place and you just have to use it, it becomes boring. It’s fun to set up a whole analytics pipeline to solve this first analytics problem. Using the exact same pipeline to solve another problem? Boring.

Go ask a developer to use your existing infrastrucutre and stack as is and simply implement new features. I bet they will be lukewarm if they don’t see any technological problem to solve, or at least some technology to learn.

I speak from experience. The project I’m working on is an application for which a dedicated platform was built. This platform provides all sorts of thing to write applications, ranging from messaging, message processing, fault tolerance, configuration management, job scheduling. You can reuse these buildings blocks to design new features. As long as features requires new combination of building blocks, it’s interesting. But once it feels like using the same pattern every time, it becomes boring, even if it’s actually the moment you’re the most productive.

What motivates developpers is leveraging technologies to solve a problem. They are interested in figuring out how to use technology for your problem, not actually having more time writing business logic. Engineers have studied computer science because they like technology more than other business domains.

Technology platforms and frameworks – app servers, cloud, data pipelines, web framework, etc. – are so amazingly complex that you will need to solve several problems with them before you feel like you master them. Also, even if you master the technologies individually, the combination of the technologies might pose some new challenges. At the same time, technology changes very fast. This is another reason why we rarely reach the peak  productivity: technologies change before we truly master them. Technology evolves fast and we’re always playing catch-up.

A VM is for instance way easier to deal with than physical hardware. Using VMs definitely improves productivity. But as soon as you have VMs you want to become elastic. And for this, you need a whole you set of tools to learn and master. Progress in technology takes the form of layers that piles on. When you’ve barely master the first layer comes already the second one. These new layers, once mastered, enable new jumps in productivity though.

Not reaching peak productivity isn’t in itself a problem, since productivity grows nevertheless. Curiosity is what makes us push technology. What’s interesting is to realize that productivity and curiosity are actually at odd. It’s because we are curious that we never truly master our technologies and don’t reach peak productivity. But it’s also because we are curious that productivity in the long term always increases.

More

In fact, we anticipate that there will soon be a whole generation of developers who have never touched a server and only write business logic.

Talk: High-available applications for rail control (@BATBern43)

Marc Hoffmann and I were invited to the 43th Berner Architekten Treffen (BATBern) on “Event-Driven Architectures”. We presented the architecture of the Rail Control System, especially the mechanisms for high-availability.

Here are the slides:

Great Articles on Software Engineering

Sometimes, I read an article, and some idea deeply resonates with me and makes a long lasting impression. It changes the way I approach some topic.

Fred Brooks’ essay “no silver bullet” was one of the very first article I read that had this effect. The concepts of esssential and accidental complexity are very powerfull, deeply resonate with me, and shaped the way I see software engineering. This essay is a classic because it had the same effect on many people.

But there are other many great articles that influenced me. Let’s recap some of them:

No silver bullet, Fred Brooks

A software system consists of essential and accidental (or implementation) complexity. We should reduce accidental complexity as much as possible, but essential complexity will still be the dominng factor.

The law of leaky abstractions, Joel spolsky

It’s very hard to devise abstractions that completely hide the underlying complexity. Often, you will need to understand some internal details no matter what.

Simple Made Easy, Rich Hickey

This is a great talk about complexity. The key takeaway is that simplicity comes from not mixing things together that shouldn’t. It’s independent of your prior knowledge. Easiness comes from habits and convention. It depends on prior knowledge.

Choose Boring Technology, Dan McKinley

A reminder that using shiny new tools isn’t always the best option and that established and mature tools has its place if they suffice to get the job done.

There is no now, Justing Sheehy

An exploration of the way to handle time in distributed systems, where there’s no global notion of time or consistency.

Beating the Average,  Paul Graham

A classic from Paul Graham where he described how using Lisp and macros gave the company an advantage over their competitors.

Life Beyond Distributed Transaction, Pat Helland

An articles about giving up distributed transactions to design internet-scale systems using simpler data models (e.g. key-value stores)

Everything You Know About Latency Is Wrong, Tyler Treat

In short: using average or percentile hides your outliers, which is an important signal to understand the real beahvior of the system.

A Note on Distributed Computing, S. Kendall et al.

An article explaining that trying to abstract remote boundaries is bound to fail.

Smalltalk: A Reflective Language, F. Rivard

A very nice explanation of Smalltalk and its reflective capabilities showing how to adapt the language to add pre/post conditions. The reification of the stack and the fact that the debugger is just a normal tool is also explained.

Reuse: is the dream dead?, Kirk Knoernschild

An exploration of the use/reuse paradox:  “Maximizing reuse complicates use”

Reflection on Trusting Trust, Ken Thompson

A wicked experiment on bootstraping

The Log: What Every Software Engineer Should Know About Real-time Data’s Unifying Abstraction, Jay Kreps

A fanstatic analysis of the distributed log as the basic building block to integrate real-time systems. So good, that it was later converted in a book: I love logs.

Most of these authors have written several other articles that are great as well.

Do You Need an Architect?

Architects do typically three things: they own, they coordinate, and they mentor.

As an owner, the architect maintains the integrity of the system at a high level. He designs the foundations, identifies tradeoffs, decides on essential changes.

As a coordinator, the architect facilitates work and optimizes the exchange of information. He connects people, gather information, and plan activities.

As a mentor, the architect provides the intellectual background to understand the system, work autonomously, and improve. He explains concepts and rationale, teaches best practices, and suggests improvements.

It’s a people and technical job.

Which kind of architect you need depend on the project and the team. If the team has enough expertise, they don’t need a mentor. If the team goes well along, they don’t need a coordinator. If the team shares the same view of the system, they can own it collectively.

So maybe you don’t need an architect.

The distinction between architecture and engineering is anyway very blurry. An architect doesn’t do something fundamentally different than an engineer. The three traits exist in every team member. Architects are simply mentoring, coordinating and owning at a different level of scale and responsabilty. Some companies (like Google and Amazon) don’t have architects. They only have engineers with different levels of seniority.

And if you think that coding vs. not coding is a fundamental difference in the job, it’s not. Both architects and engineers are doing software design.

The more happens organically through self-organisation in the team, the better. But self-organisation is hard and it frequently fails. If mentoring, coordination or ownership do not happen as they should, you’re in trouble. Identifying clear responsabilities might help.

So maybe you will need an architect after all.

MORE

Conceptual Integrity at Scale

The central argument of the Mythical Man Month from Fred Brooks is that conceptual integrity is the most important consideration in system design, and that conceptual integrity will only be achieved if the design comes from one, or a few resonant minds.

I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

[…]

Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.

If you’ve been the creative force in a group work, you will have experienced these challenges. Core ideas are misunderstood, insoncistencies start to pop up, and the result is a patchwork.

For my part, I can confirm that consistency erodes quickly if you don’t pay close attention. Maintaining conceptual integriy is hard work.

This doesn’t happen because people are dumb, neglecting or malevolent. It happens because as soon as you specialize, you lose sight of the whole. Someone does a change here, someone a change there, and both changes end up not being fully consistent with each other.

Unfortunaltely, unlike Brooks suggests, doing all the design work alone is usually not realisitc.

With a good review culture you can scale your design team from one head to a few: let people design parts of the system even if their understanding of the whole system is lacunary, and have one central person review how well the contributions fit it.

It’s like having mutliple authors for an article but having one person in charge of doing a complete pass on the article at the end to ensure consistency.

But if you want to tackle bigger challenges, you will have to scale your design team even more.

Ensuring conceptual integrity at scale is hard because it requires not only scaling knowledge but also standardizing the decision making process.

This is what guidelines try to achieve. Guidelines encode the principles, maxims, constraints, and goals of the system in a way that different people reach similar decisions. It’s evidently impossible to encode the complete decision making process in guidelines, given that so much subjective, but they help achieve a basic overall consistency.

As for the subjectivity: just take one of your colleague and ask yourself “what would he decide?” You might have a hunch at his decision, but chances are, you don’t know enough about all the thinking that went in his previous decisions to predict this one accurately. If you do, well, you’re two “reasonant” minds, as Brooks would say.

If you know lots of people will be involved in the design process, you will need more than guidelines and reviews. You will have to decompose the problem in parts that can be solved individually. Each part can be assigned one “mind”. The whole might not be fully consistent, but the solution at each level of abstraction will at least be consistent.

Following the newspaper analogy, a newspapers has an editor in chief that sets the tone of the writings and the overall orientation (these are guidelines). He or she will review the topics of the individual articles to make sure they fit in the issue of the newspaper, but he or she won’t edit every article himself (the parts).

No large system will be fully consistent (think of Microsoft Office, that our dear journalists might be using), but it doesn’t hurt too much, because no user will ever use all of the system.

Evolution will also bring some inconsistencies in the system. Moving from one system paradigm to the next is like moving from one local maxima to the next one. In between things will be worse, that is, less consistent. But if you think there’s a superior design paradigm for the whole system, it’s worth challenging the current one and see if there’s a path.

Fred Brooks is right that conceptual integrity is the most important aspect in system design. He’s also right that the more designers there are, the harder it is to ensure concistency. But for large systems that evolve, some inconsistencies are inevitable. Address them like other risks in your project.

In Defense of Design Before Coding

Software design as a separate activity from implementation — “up front” design — got a bad press with agile methods.

Agile advocates say the design should be emergent. They say, design without coding is waterfall. It’s a waste of time.

I understand that you don’t want to design the whole system up front. But at the feature level, a bit of thinking before coding does miracles, I say.

My first argument is visible design. Looking at the code doesn’t reveal the whole design because code only shows the static structure. The design is more than that. To understand how the system works you must run it, but even then the sequencing of events is still invisible. If you want effective feedback on the design, you must make it visible. People that jump directly to code still end up sketching or drawing things for their colleagues to explain their design and get feedback. Designing up front makes the design visible up front.

My second argument is speed of iteration. Even with higher-level programming languages, there is a gap between the concepts and the implementation. There is some work needed to implement the thing for real and take care of all the details. Running the system in your your head, or on paper, to challenge the design enables faster iterations on the design. CRC sessions are for instance a nice way to explore the design space effectively, without coding yet.

My third argument is better reasoning. The code level is just one of the many abstraction levels you can use to reason about the system. When you’re trying to identify the main abstractions, what their responsibilities are, and how they play together, this abstraction level is often too low. One abstraction with a clear responsibility might map to several classes. There might be for instance a “scheduler” that will be implemented using several concurrency primitives. These are implementation details (althrough interesting ones!) irrelevant for now. Working at the code level forces you to think at one specific abstraction level. Working on paper enables you to choose the optimal abstraction level to work out the design.

And finally, my fourth argument is tracking rationale. The code defines how the system works but gives in itself little clues as to why it was designed so. Design is all about trade offs: what were the they? If you never learn to design up front on paper you will never learn to document software understandably, too. And without documentation later, the rationale will be lost.

You should design up front so far you can. Then switch to a computer.

How far you can design up front depends on your intellect and your knowledge of the problem domain. Learn to assess the confidence in your up front design correctly, and identify when to stop, since there lies the danger: too much time spent designing on paper something that doesn’t work. But some design up front has its place.

More

Why a Calendar App is a Great Design Exercise

To check if a salesman is good, one classic is the “Sell me this pen” test. To check if a software designer is good, I propose the “Design me a calendar app” test.

That was one of the topic we chose for the software engineering lab, and I loved the results.

There are several reasons why it works well as a design exercise:

Everybody can relate − The domain is easily understood and everybody can relate. Who hasn’t used a calendar app?

It’s easy but not so easy − Managing events that occur once and are short is easy. But it gets more interesting as soon as events are recurring (series), span multiple days, are entered in different time zones, or have rooms associated with them. The design becomes more complex not because independent features pile up, but because the complexity of the core model increases.

Time is messy − A lot of complexity in business software comes from the fact that business rules are “arbitrary”. They make sense at the business level because of processes, domain knowledge, etc. but it’s hard to capture some clear “logic” behind them in software. Introducing such a business domain for an exercise is possible, but takes time. On the other hand, every body knows the idiosyncrasies of the Gregorian calendar already. There is little “logic” behind February having only 28 days and occasionally 29. But, yes, it means a month might sometimes overlap exactly 4 weeks and not 5. Deal with it in you UI.

It’s not just the server − This design exercise raises interesting questions not only in the backend, but also the frontend. What’s the right model? How can we display it fast? What’s the expectation of the user when the start time of a meeting is changed: to shorten it or to move it? These questions don’t have to do with the technology stack. They are inherent to the product. For questions like the last one, I recommend reading The Math of Easy-to-Use from Terry Crowley, former head of development for Microsoft Office, including Microsoft Outlook. He knows about calendar apps.

In the The Mythical Man Month, Fred Brooks explains that one of his favorite interview question is “Where is next November?”.

I have long enjoyed asking candidate programmers, “Where is next November?” If the question is too cryptic, then, “Tell me about your mental model of the calendar.” The really good programmers have strong spatial senses; they usually have geometric models of time; and they quite often understand the first question without elaboration.

Mental models of time are cultural. In western societies, time flows from left to right; the past is behind us and the future in front of us. In other societies, it’s the way around. So I wouldn’t quite expect a specific answer to this question. But I would agree with Fred Brooks that a good ability to model time is a predictor of good design skills in general.

If you don’t get much from this exercise, it will at least make you more aware of the problems that exist dealing with time in computer programs and to use libraries properly. This is a valuable programming skill on its own. The system I’m working on (a train dispatching system) doesn’t work correctly during the night of the daylight saving time (DST) change in autumn, since time jumps back if the DST offsets aren’t accounted for. If you’ve designed a calendar app once in your life, you are aware of such pitfalls.

So, please, don’t design todo apps as exercise. Design calendar apps. It develops real design skills and will make real-world software less buggy.

More

Things You Can’t Abstract

The art of programming is to a large extent the art of devising abstractions. Some might be very general and reusable in many contexts, some will be more specialized and applicable only in some domains.

The purpose of abstraction is to hide complexity so that we don’t need to care about details. Using abstractions, we can “raise the abstraction level”.

Data structures, relational databases, file systems or garbage collection are all examples of common programming abstractions. There are of course many more.

Abstracting is not unique to programming. For instance, the DNA, the cell, the organ and the organism are different abstraction levels in biology.

An abstraction defines a contract between a user and a provider. The less constraints there are in the contract the more freedom there is in the implementation possibilities. It’s tempting to abstract away all non functional aspects, but it’s actually a bad idea: you will need to understand them to use the abstraction correctly.

First, you can not abstract performance. Wether an operation takes O(1) or O(n) is not something you can ignore. Eventually, at some point you will have to care about the implementation of the abstraction to understand its performance characteristics. Abstracting performance and letting the runtime figure out the best optimization strategy look nice on paper but is the source of many headache. You will need to know how your data structure performs, how your database fetches data, how many files can reasonably exist in a folder, and when your garbage collection kicks in.

Second, you can not abstract failure modes. If something can fail, you can not ignore it. This is especially true of the network: if something is remote it can be inaccessible. Attempts to abstract the network as if everything were local simply do not work. An abstraction can have few failure modes, but there is no abstraction that never fails. You will need to understand how your data structure reacts when it can’t expand, how you database reacts when your commit is so big that its transaction log is full, when your file system is not reachable, and when the garbage collector can’t reclaim space.

And third, you can not abstract the consumption of shared resources. Since shared resources are finite and common to the whole system, every component is indirectly related to every other components. You will need to understand how much memory you data structure takes, how much of your data fit in the database cache, how much disk space your system consumes, how much clock cycles are eaten by garbage collection runs.

That makes a lot of aspects we can’t abstract. Joel Spolsky was right. All non-trivial abstractions eventually “leak“. Barbara Liskov was wrong. In practice, two abstractions with the same functionality cannot be “substituted“, unless they also have the same non-functional characteristics.

It is discouraging to realize we can’t abstract as much as we want, but it doesn’t mean abstraction doesn’t work. You will need to know a bit more about data structure, database, file systems, and garbage collection than you thought to use them correctly, but you can still ignore a lot of the internal details. The goal of hiding some complexity is achieved, but not of hiding all complexity.

More