ewernli

Lines Spent

Studies have shown that the productivity of a developer is about 10 LOC/day. Considering that modern software consists in millions of lines of code, this number is appalling.

Measuring productivity with LOC is of course a dangerous thing to do. With programming, quality does not correlate with quantity, and it is wise to remember the words of Dijkstra:

If we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”. — Dijkstra

Indeed, programming is not a production process but a design process. Before a piece of code reaches maturity, various directions might first need to be explored, refined, and analyzed. Qualities like readability, performance, or reliability are competing dimensions in the design space. There is rarely “one way” to program something. Effective programming is about finding the best possible tradeoff in the shortest timeframe.

Writing high-quality code requires also a high level of rigor. Code must obey the established naming conventions, idioms and patterns of the system; it must be systematically refactored to prevent technical debt to accumulate; and it must be exhaustively tested. This level of discipline pays off in the long-term, but requires more time in the short-term.

Also, as a system grows, it becomes progressively harder to maintain an accurate mental model of the system. Considerable time is thus spent assessing the existing system to temporarily reconstruct enough knowledge for the task at hand. During this time, no code is written.

One could argue that counting total lines of code is useless; what is relevant is the number of lines of code modified per commit. Lots of modifications but no additions would suggest a modular system where features can be adapted declaratively without “writing new code”. Unfortunately, metrics tools do not follow this view and treat modifications as second-class citizens.

So, is a low number of LOC/day per day good or bad? For an optimist, it might indicate a sign of code quality; the team designs carefully and takes care to write the minimum amount of code needed. For a pessimist, if might indicate that the project is stalled; you’re not delivering enough code to make the deadline. For a realist, one thing is sure: software engineering is a very expensive activity.

More

Software As Liability

Masterminds of Programming

Masterminds of Programming features exclusive interviews with the creators of popular programming languages. Over 400+ pages, the book collects the views of these inventors over varying topics such as language design, backward compatibility, software complexity, developer productivity, or innovation.

Interestingly, there isn’t so much about language design in the book. The creation of a language seems to happen out of necessity, and the design itself is mostly the realization of an intuitive vision based on gut feelings and bold opinions. The authors’ judgments about trade-offs (e.g. static or dynamic typing, or security vs performance) are surprisingly unbalanced, and when asked to explain the rationale for some design choices, explanation tends to be rather scarce.

Instead, the authors describe with passion the influences that led them to a particular design. The book contains thus a good deal of historical information about the context in which each language was born.

C++ was invented to enable system programming with objects
Awk was invented to easily process data in a UNIX fashion
Basic was invented to teach students programming
LUA was invented to easily script components
Haskell was invented to unify the functional programming language community
SQL was invented to query relational database with an approachable language
Objective-C was invented to bring objects to the C world
Java was invented to provide a secure language in a networked world
C# was invented as the strategic language for the modern Microsoft platform .NET
UML was invented as the unification of modeling languages
Postscript was invented to enable flexible typesetting and printing
Eiffel was invented to make objects robust with contracts

Both the interviewers and interviewees are knowledgeable and articulate. The inventors smoothly distill their experience and insights during semi-structured interviews. Throughout the book, discussions remain mostly general, which both a plus and a minus: the material is accessible to all, but multiple sections have a low information density. The book could be easily shortened with a better editing.

Discussion about software engineering in general turned out to be the one I enjoyed most. Some of the interesting ideas touched in the book were for instance:

Simulating projects help acquire experience faster, p.254
Classes are units of progress in a system, p.255
We need of an economic model of software, p.266
Object-oriented programming and immutability are compatible, p.315
What UML is good for: useful for data modelling, moderately useful for system decomposition, not so useful for dynamic things, p.342
Generating code from UML is a terrible idea, p.339
There’s no software crisis; it’s overplayed for shock value, p.354
How broken HTML is, and how better it would have been if the web had started with a typesetting language like postscript, p.405

These points come from the late interviews, but there are similarly nice bits and pieces in all chapters; it just turned out that I starting taking notes only half through the book.

Amongst the recurring themes, the notion of simplicity pops out and is discussed multiple times, at the language level and a the software level. Several interviewees quote Einstein’s “Simple as possible, but not simpler”, and emphasize the concepts of minimalism and purity, each in their own way.

The book is also very good at instilling curiosity about unknown languages. I was initially tempted to skip chapters about languages I didn’t know, and am glad that I didn’t. Stack-based languages like Forth and Postscript appear as examples of a powerful but underlooked paradigm; the chapter about awk almost reconciled me with bash scripting; and the discussion about UML made me reconsider its success—the fact that the whole industry agreed on a common notation for basic language constructs shouldn’t be taken for granted.

In conclusion, this book isn’t essential, but it is enjoyable if you are an all-rounder with some time ahead, you appreciate thinking aloud, and good discussions around a cup of coffee.

Software And Tactics

The image of a software engineer is that of a quiet and analytical guy working in isolation on some green-on-black code. There is some truth in this image. Studies have shown that interruptions are bad for programming, and that engineers need long streaks of uninterrupted time to fully immerse into a development activity. Projects are frequently structured in modules that are owned by individual programmers. In this solo view of software engineering, the less communication, the better.

In the agile view of software engineering, people and communication are at the center. You succeed as a team, or fail as a team. The code is not owned by individuals, but collectively by the team (chapter 10, Extreme Programming Explained). Work is organized into short actionable tasks, which prevents multi-tasking, ensures high focus, and high reactivity. These strict rules are the key ingredients to hyper-productivity.

Agility is the result of the combination of several elements, such as unit-testing, continuous integration, refactoring, etc. Amongst these elements, collective ownership is one of the hardest to implement. In contrast to the other elements, collective ownership requires a change of attitude, not just a change of technical practices. It requires moving from a solo mindset to a collective mindset.

To better understand how a collective mindset can be implemented, one should look at other professions were a collective mindset is critical. This is the case for instance for sport teams, firefighters, police officers, or SWAT team (Special Weapons And Tactics).

A SWAT team’s effectiveness depends on its excellence in several practices:

Communication

SWAT team members communicate the action they engage in, the risks, and impediments. Communication must be concise, and adhere to a common vocabulary. The team lead oversees and coordinates the activities if necessary.

These considerations apply pretty much as is for software engineering:

– “I’m about to launch the stress test of the web portal. Do you copy?”
– “Copy that. I’m monitoring the logs.”

Execution

SWAT team members train together standard practices and procedures to improve execution, such as the manipulation of weapon or hardware. Only Practice makes perfect.

Software engineers should similarly train standard practices to improve execution, and master their tools. Sample practices to train include:

Navigating in the IDE
Synchronizing and merging code
Updating database (scripts, data, etc.)
Deploying software
Running various kinds of tests
Assessing code quality
Gathering performance metrics
Keeping the wiki up-to-date
Organizing release notes

Pairing

SWAT teams operate in dangerous environments. Mistakes are usually fatal and threats abound. By working in pairs, members can watch one others to prevent mistakes and protect themselves.

Pairing is great for software engineering, too. It reduces the risk of mistakes during coding, deployment, database updates. “given enough eyeballs, all bugs are shallow” as Linus’s Law says. While there are no external threats to software development, pairing favors knowledge transfer, and if a member is sick or leaves the team, the work can still go on smoothly.

The highly dynamic view of collective software engineering is as a complete clash against the highly analytical view of solo software engineering.

There are definitively parts of software engineering, such a design, that require quietness and thinking. But a large part of daily software engineering activities aren’t so: small refactorings, writing unit tests, fixing integration issues, measuring load and response times, etc. do not involve much thinking. They just need to be done.

There is scientific evidence that 80% of what a software developer does in a day—different steps and small microsteps— is not brain work. They do what they have done 50, 100, 1,000 times before. They just apply a pattern to new situations. — Mastermind of Programming, p.336

Lastly, collective software engineering requires redefining working time. In most working environments, individuals can work with their own schedule (hours, rhythm, pace). This is perfectly fine in the solo view of software engineering; however, it breaks the dynamics in collective software engineering. Ideally, team members always work together towards the team’s objective.

Software engineering is not always a creative endeavour. It is a fight against time and code rot. To win this fight, you need clever tactics. The challenge is to work as a an effective SWAT task force — where SWAT stands for Software And Tactics.

Your Language is a Start-up

Watching the TIOBE index of programming language popularity is depressing. PHP and Javascript rule the web, despite the consensus that they are horrible; Haskell and Smalltalk are relegated to academic prototyping, but unanimously praised for the conceptual purity. How technolgy adoption happens is a puzzling question.

Evidences seem to suggest that what matters is to attract a set of initial users, and then expand. The initial offer needn’t be particularly compelling. As long as it wins on one dimension – maybe because it’s ambarassingly simple, or provides a very effective solution to a very specific problem – it might attract early adopters. PHP won because of its simplicity to get things done; Javascript won because it was the first to provide a solution to make HTML dynamic. After initial success in a niche, the technology can evolve to attract more users. PHP and Javascript evolved later to fix their initial design flaws. They are both now mature object-oriented languages.

The price for fixing initial flaws is however extraordinary high. Once a language feature is designed and made available, it’s cast in stone. Evolving a language while maintaining backward compatibility is extremelly challenging, but breaking compatibility and dealing with multiple branches isn’t much of an easy solution neither. Notorious examples of evolutions in Java are the Java Memory Model and generics. It took years of research to plan them, and years of availability to reeducate the community. C++ is still trying to catch up, and still lacks feature that we take for granted on some plateforms, e.g. a standardized serialization.

Surprisingly, when adoption happens, it might be from a difference audience than the one expected. “Languages designed for limited or local use can win a broad clientele, sometimes in environments and for applications that their designers never dreamed of.” say the authors of Mastermind of programming. This is definitively true. Java was initially designed for embedded systems, but succeed instead in the enterprise. Javascript was thought as a thin veneer for web pages, but now powers the new generation of client-side web app and is even expanding to the server-side.

When a technology starts to decline after the adoption peak, don’t be too quick to claim it dead. It might enjoy an unexpected renaissance. Many have for instance claimed that Java was dead. They have failed however to recognize the the underlying JVM is a rocking beast, amazingly fast and versatile for those who know how to tame it. Nowadays, one of the best strategy to implement new languages is to leverage the JVM and provide interoperability with Java libraries. In turn, this massive adoption of the JVM by new language implementors is driving innovation in the JVM itself, which has been extended with new bytecode for languages other than Java. I doubt that James Gosling had anticipated this evolution of the platform.

Clearly, the key characteristics of a language are its syntax and semantics, since they define together its expressive power. Expressivity isn’t however the unique force at play for adoption. What a real-world check suggests is that expressivity is only one factor amongst many other technical and social factors. The ease of debugging or the existence of a friendly community could for instance turn out to more important for some users than the ease of writing code. Language designers typically understimate such factors, severly impeding their chances of success.

To foster adoption, one must also rekon that people are reluctant to change. What people are already familiar with must be taken in consideration: in 2008, mobile users wouldn’t have been ready for the minimalistic iOS 7 interface. They were however ready for the original skeuomorphic interface, and now that they have become familiar with it, they can get rid of the skeuomorphic ornaments. People don’t change for the sake of changing, they change to solve or problem, and they change only if the gain outweight the pain. For programming languages, the problem is productivity and the pain is learning a new platform. In 2013, developers might not be familiar enough with functional programming to adopt a pure functional language like Haskell, but they definitively are ready to adopt a hybrid language like Scala.

Together, these elements might help explain the failure of some great languages, for instance Lisp. Lisp is a beautiful programming language that offers amazing flexibility. For a skilled practitionner, lisp is a secret weapon. However, lisp does nothing particularly well out of the box. “Lisp isn’t a language, it’s a building material.”, dixit Alan Kay. Clojure, on the other hand, is a Lisp dialect with just enough direction to solve one very painful problem: writting concurrent code. Given that the problem is so painful, people won’t mind a few parenthesis to solve it. This choice paid off, and in 2012 Clojure moved in the “adopt” quadrant of Thoughwork’s technology radar.

The language business is a competitive business where idealism won’t prevail. For a language to be adopted, it must solve a problem for some early adopters, who will then create an attractive ecosystem that will convince the late majority. In other words: language designers should think of their language like a start-up.

Thinking in Lifecycle

After having worked hard on a design I came once to a colleague to ask him his opinion about it. “Do you think I’ve decomposed the problem in the right way?” I said. He look at it and answered “You are mixing elements with different lifecycles”. I came back to my desk confused and worried.

What I understood later is that every element in the system always has a lifecycle, be it objects in memory, data in the database, business logic in the code, build artifacts, design documents, or configuration properties. Too many times, the lifecycles are poorly understood or poorly managed. Elements with different lifecycles need to be able to change independently of each other’s. Your design should reflect these lifecycles.

Thinking in terms of lifecycle is a very powerfull design tool. Surprisingly, it isn’t particularly emphazised in the literature about software design and architecture. The closest would be the concepts of coupling and cohesion, which capture how changes ripple in the system. The single responsibility principle is also close, when formulated as “A class should have only one reason to change“.

I’m happy my colleague made me aware of this particular way of thinking and that I could add this design tool to my intelectual toolbox. It’s proved very valuable since then.

The New Digital Age

The New Digital Age explores the impact of internet connectivity and digital media on society. The book witnesses changes that have already occurred, reviews current trends, and tries to predict some future moves.

Written by Eric Schmidt, a tech executive, and Jared Cohen, a former foreign policy advisor, the book focuses on the impact of technology at the political and societal level, not so much at the individual level (only the first chapter “Our future selves” is about it). I applaud this ambitious agenda.

People interested in technology and cyber criminality (e.g. TED talks, Wired) might be familiar with some of the observations and speculations in the book. The novelty that it carries will depend on the background of the reader. Some of the predictions are however unique to the authors, and they do not hesitate to give their personal opinions. This gives a special edge to the matter.

The trends and predictions are usually backed up with short annectodical evidences that are interesting in themselves. The overall discussion remains however usually quite abstract, which at times gives the impression that it lacks substance. This is to be expected from such a book, though. Prediction and precision don’t match up very well.

My main criticism of the book is that while the chapters tell a consistent story of how society evolves with periods of peacetime, revolution, conflict, and reconstruction, the chapter internals do not enjoy such a coherent treatment. The predictions that they discuss appear to exist more by accident than as the outcome of a thorough analysis. For instance, I do not recall reading anything about electronic voting. This seems to me like an unavoidable topic for such a book.

The book gives also a slight feeling of redundancy. Certain topics are discussed from a different point of view from chapter to chapter. For instance, the tension between privacy and security is discussed under the perspective of state organization, militantism, counter terrorism, etc. An improvement for a second edition would be to provide a roadmap of recurring topics and their treatment in each chapter. That would give a high-level view of the content, and would avoid this unpleasant feeling of redundancy.

While the positions in the book are relatively balanced, the overall tone is inevitably biased towards US policy, which is no suprise given Jared Cohen’s background. Also, the book emphasizes tracking and surveillance a lot and will make proponents of an anonymous internet uneasy.

Overall, I liked the book. The themes addressed are very relevant and it sharpened my understanding of the role of technology in modern society. What the future will really bring, nobody knows.

The Zen of Oscar

Writing a good research paper or thesis is hard. It can be very intimidating to figure out the scope of the work needed to be done. Thanks to Oscar, I have been empowered with three adjectives to think about problems of scope. These adjectives were: breadth, depth and completeness.

Breadth and Depth

At the early stage of my thesis, I prepared a list of action items that I proposed to address for my research. I went to see Oscar for a confirmation that addressing these points would lead to a good thesis. I described him the list of action items and asked “Oscar, if I realize all this, will this make a good thesis?”. He look at me and said “For a good thesis, you must cover a topic with sufficient breadth and depth“. I came back to my office, confused and worried.

What I had failed to understand is that a thesis needs a frame. A frame has a breadth, and depth. The breadth characterizes the perspectives on the idea, and the depth the level of details. Say, your thesis is about dynamic updates. Technical feasibility and user adoption are two perspectives that belong to the breadth dimension. Implementation and formalization are different levels of details that belong to the depth dimension.

Different pieces of work have different ratios of breadth and depth. A position paper might have lots of breadth, but little depth. A paper proposing an optimization of an algorithm has little breadth, but lots of depth. For a thesis, you need a good ratio of both.

Completeness

Later on, I was once invited to prepare an extended version of a conference paper for a journal. I prepared a draft of the changes and went to see Oscar. I asked “Oscar, are my changes enough for an extended journal paper?”. He looked at me and said “For a journal paper, you must do what is necessary for your research to be complete“. I came back to my office, confused and worried.

What I had failed to understand is that maturity of research is not defined by how much has been done in terms of effort, but by the actual amount of speculation left. Research is an incremental process, and a piece of research is complete when all that was needed to support the claim (results or analysis) has been provided. Let’s imagine that you have an implementation of a dynamic update algorithm that you claim is efficient. You must show efficiency in speed and memory use for the research to be complete.

Thinking in terms of breadth, depth and completeness has become a simple technique in my toolkit of planning methods.

Simplicity Prevails

We engineers are masters of self-deception when it comes to our aptitudes to handle complexity. We believe we are way better at handling complexity than we actually are. In practice the level of complexity that people can master (me including) is disappointing low.

Instead of self-deceiving ourselves, we should embrace our limitations and aim for simplicity. This is what all geniuses of our time have been preaching for. Simplicity prevails.

“Simplicity is the ultimate sophistication.” — Leonardo da Vinci

It is common for tools to be too complex. We all know that only a tiny fraction of the features of Word are used, still the usual trend for a product is towards adding more and more features. Often, the best thing for a product is taking something away from it. Only simple tools prevail.

One problem with simplicity is that it is often confused with easiness or triviality. Easiness and triviallity are subjective properties relative to a user. They refer to the sense of familiarity or ordinarity. Simplicity is an objective property that refers to purity, the absence of mixing distinct elements.

Let us look at some simple features that are clear wins.

Using Venn Diagrams for Access Control

Have you ever been confused by the security settings of an application, not knowing what would be accessible or not to other users? Venn diagrams are simple and can be used to make an interface intuitive. Win!

Using Pictures instead of Texts to Log

Have you ever found yourself overwhelmed by the difficutly to compare tens of print statements to understand state mutations over time? Comparing texts is hard; comparing images is easy. Using a visual log is simple and supports better debugging. Win!

Using Examples to Test Programs

To test your code you exercise it with chosen inputs that serve as examples? Well, you could have invented unit testing. As Fowler says: “Kent’s framework had a nice combination of absurd simplicity and just the right features for me”. Win!

Using Live Code instead of Static Code to Understand Programs

Have you ever felts exhausted trying to mentally run code in your head when inspecting static sources? The underlying question is: what prevented you from runinng it and see it live? Finding a suitable unit test and breaking at the start of the method you are inspecting can be automated to become a one-click operation. Win!

Optionless search

Have you ever been repelled by the sheer amount of options in a search form? For instance this one. That’s tackling the problem of search wrongly. Google, Airbnb, and Facebook got it right, offering essentially one unique text input and hiding the magic of relevance matching. Win!

These examples show what I consider to be the level of complexity we can handle, and should aim at. These features are so simple to use they will immediately look easy and trivial. This is a good thing.

Debunking Object-Orientation

What is an Object? What is the essence of the object pardigm? How do objects differ from other abstractions? What are their benefits? What are their pitfalls? Can we encode objects with lower building blocks? Should we have objects all the way down?

Some people think they are clever to observe that OOP has no formal, universal definition. Democracy, love and intelligence don’t either. — Tweet from Allain de Boton

Some thoughts:

And for the haters:

The Cost of Volatile

Assessing the scalability of programs and algorithms on multicore is critical. There is an important literature on locks and locking schemes, but the exact cost of volatile is less clear.

For the software composition seminar, Stefan Nüsch and I decided to look at this topic. Essentially, we devised a benchmark where multiple threads would access objects within a pool. Each thread has a set of objects to work with. To generate contention, the sets could be fully disjointed, have partial overlap, of have a full overlap. The ratio of reads and writes per thread was also configurable.

On an AMD 64 Dual Core, the graph looks as follows:

On a i7 Quad Core, the graph looks as follows:

We clearly see that different architectures have different performance profiles.

In future work, we could try to reproduce false sharing and assess the impact of other forms of data locality.

More details about the benchmark and methodology can be found in his presentation. The code in on github.

Here a some links about the semantics of volatile, and mememory management in general.