Package Visibility is Broken

In Java, classes and class members have by default package visibility. To restrict or increase the visibility of classes and class members, the access modifiers private, protected, and public must be used.

Modifier Class Package Subclass World
public Y Y Y Y
protected Y Y Y N
no modifier Y Y N N
private Y N N N

(from Controlling Access to Members)

These modifiers control encapsulation along two dimensions: one dimension is the packaging dimension, the other is the subclassing dimension. With these modifiers, it becomes possible to encapsulate code in flexible ways. Sadly, the two dimensions interfere in nasty ways.

Shadowing

A subclass might not see all methods of its superclass, and can thus redeclare a method with an existing name. This is called shadowing or name masking.  For instance, a class and its subclass can both declare a private method foo() without that overriding takes place. This situation is confusing and best to be avoided.

With package visibility, the situation gets worse. Let us consider the snippet below:

package a;
public class A {
int say() {return 1;};
}
package b;
public class B extends a.A {
int say() {return 2;};
}
package a;
class Test {
public static void main(String args[]) {
a.A a = new b.B();
System.out.println(a.say()); // prints 1, WTF!!
} }

 (from A thousand years of productivity: the JRebel Story)

The second method B.say() does not override A.say() but shadows it. Consequently, the static type at the call site defines which method will be invoked.

One could argue that everything works as intended, and that it is clear that B.say() does not override A.say() since there is no @Override annotation.

This argument makes sense when private methods are shadowed. In that case, the developer knows about the implementation of the class and can figure this out. For methods with package visibility, the argument is not acceptable since developers shouldn’t have to rely on implementation details of a class, only its visible interface.

The static types in a program should not influence the run-time semantics. The program should work the same whether the variable “a” has static type “A” or “B”.

Reflection

With reflection, programmers have the ability to inspect and invoke methods in unanticipated ways. Reflections should honor the visibility rules and authorize only legitimate actions. Unfortunately, it’s hard to define what is legitimate or not. Let us consider the snippet below:

class Super {
@MyAnnotation
public void methodOfSuper() {
}
}

public class Sub extends Super {
}

Method m = Sub.class.getMethod("methodOfSuper");
m.getAnnotations(); // WTF, empty list

Clearly, the method methodOfSuper is publicly exposed by instances of the class Sub. It’s legitimate to be able to reflect upon it from another package. The class Super is however not publicly visible, and its annotations are thus ignored by the reflection machinery.

Package visibility is broken

Package-visibility is a form of visibility between private and protected: some classes have access to the member, but not all (only those in the same package). This visibility sounds appealing to bundle code in small packages, exposing the package API using the public access modifier, and letting classes within the package freely access each others. Unfortunately, as the examples above have shown, this strategy breaks in certain cases.

Accessiblitiy in Java is in a way too flexible. The combination of the fours modifiers with the possibility to inherit and “widen” the visibility of classes and class members can lead to obscure behaviors.

Simpler forms of accessibility should then be preferred. Smalltalk supports for instance inheritance, but without access modifiers; methods are always public and fields are always protected. Go, on the other hand, embraces package visibility, but got rid of inheritance. Simple solutions are easier to get right.

NOTES:

  • In “Moderne Software-Architektur: Umsichtig planen, robust bauen mit Quasar” the author argues that method level visibility makes no sense. Instead, components consist of classes, which are either exposed to the outside (the component interface) of belond to the component’s internals and are hidden (the component implementation). This goes in the direction of OSGi and the future Java module system.

Lines Spent

Studies have shown that the productivity of a developer is about 10 LOC/day. Considering that modern software consists in millions of lines of code, this number is appalling.

Measuring productivity with LOC is of course a dangerous thing to do. With programming, quality does not correlate with quantity, and it is wise to remember the words of Dijkstra:

If we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”. — Dijkstra

Indeed, programming is not a production process but a design process. Before a piece of code reaches maturity, various directions might first need to be explored, refined, and analyzed. Qualities like readability, performance, or reliability are competing dimensions in the design space. There is rarely “one way” to program something. Effective programming is about finding the best possible tradeoff in the shortest timeframe.

Writing high-quality code requires also a high level of rigor. Code must obey the established naming conventions, idioms and patterns of the system; it must be systematically refactored to prevent technical debt to accumulate; and it must be exhaustively tested. This level of discipline pays off in the long-term, but requires more time in the short-term.

Also, as a system grows, it becomes progressively harder to maintain an accurate mental model of the system. Considerable time is thus spent assessing the existing system to temporarily reconstruct enough knowledge for the task at hand. During this time, no code is written.

One could argue that counting total lines of code is useless; what is relevant is the number of lines of code modified per commit. Lots of modifications but no additions would suggest a modular system where features can be adapted declaratively without “writing new code”. Unfortunately, metrics tools do not follow this view and treat modifications as second-class citizens.

So, is a low number of LOC/day per day good or bad? For an optimist, it might indicate a sign of code quality; the team designs carefully and takes care to write the minimum amount of code needed. For a pessimist, if might indicate that the project is stalled; you’re not delivering enough code to make the deadline. For a realist, one thing is sure: software engineering is a very expensive activity.

More

Software As Liability

Masterminds of Programming

Masterminds of Programming51-8dA--hLL features exclusive interviews with the creators of popular programming languages. Over 400+ pages, the book collects the views of these inventors over varying topics such as language design, backward compatibility, software complexity, developer productivity, or innovation.

Interestingly, there isn’t so much about language design in the book. The creation of a language seems to happen out of necessity, and the design itself is mostly the realization of an intuitive vision based on gut feelings and bold opinions. The authors’ judgments about trade-offs (e.g. static or dynamic typing, or security vs performance) are surprisingly unbalanced, and when asked to explain the rationale for some design choices, explanation tends to be rather scarce.

Instead, the authors describe with passion the influences that led them to a particular design. The book contains thus a good deal of historical information about the context in which each language was born.

  • C++ was invented to enable system programming with objects
  • Awk was invented to easily process data in a UNIX fashion
  • Basic was invented to teach students programming
  • LUA was invented to easily script components
  • Haskell was invented to unify the functional programming language community
  • SQL was invented to query relational database with an approachable language
  • Objective-C was invented to bring objects to the C world
  • Java was invented to provide a secure language in a networked world
  • C# was invented as the strategic language for the modern Microsoft platform .NET
  • UML was invented as the unification of modeling languages
  • Postscript was invented to enable flexible typesetting and printing
  • Eiffel was invented to make objects robust with contracts

Both the interviewers and interviewees are knowledgeable and articulate. The inventors smoothly distill their experience and insights during semi-structured interviews. Throughout the book, discussions remain mostly general, which both a plus and a minus: the material is accessible to all, but multiple sections have a low information density. The book could be easily shortened with a better editing.

Discussion about software engineering in general turned out to be the one I enjoyed most. Some of the interesting ideas touched in the book were for instance:

  • Simulating projects help acquire experience faster, p.254
  • Classes are units of progress in a system, p.255
  • We need of an economic model of software, p.266
  • Object-oriented programming and immutability are compatible, p.315
  • What UML is good for: useful for data modelling, moderately useful for system decomposition, not so useful for dynamic things, p.342
  • Generating code from UML is a terrible idea, p.339
  • There’s no software crisis; it’s overplayed for shock value, p.354
  • How broken HTML is, and how better it would have been if the web had started with a typesetting language like postscript, p.405

These points come from the late interviews, but there are similarly nice bits and pieces in all chapters; it just turned out that I starting taking notes only half through the book.

Amongst the recurring themes, the notion of simplicity pops out and is discussed multiple times, at the language level and a the software level. Several interviewees quote Einstein’s “Simple as possible, but not simpler”, and emphasize the concepts of minimalism and purity, each in their own way.

The book is also very good at instilling curiosity about unknown languages. I was initially tempted to skip chapters about languages I didn’t know, and am glad that I didn’t. Stack-based languages like Forth and Postscript appear as examples of a  powerful but underlooked paradigm; the chapter about awk almost reconciled me with bash scripting; and the discussion about UML made me reconsider its successthe fact that the whole industry agreed on a common notation for basic language constructs shouldn’t be taken for granted.

In conclusion, this book isn’t essential, but it is enjoyable if you are an all-rounder with some time ahead, you appreciate thinking aloud, and good discussions around a cup of coffee.

Your Language is a Start-up

Watching the TIOBE index of programming language popularity is depressing. PHP and Javascript rule the web, despite the consensus that they are horrible; Haskell and Smalltalk are relegated to academic prototyping, but unanimously praised for the conceptual purity. How technolgy adoption happens is a puzzling question.

Evidences seem to suggest that what matters is to attract a set of initial users, and then expand. The initial offer needn’t be particularly compelling. As long as it wins on one dimension  maybe because it’s ambarassingly simple, or provides a very effective solution to a very specific problem it might attract early adopters. PHP won because of its simplicity to get things done; Javascript won because it was the first to provide a solution to make HTML dynamic. After initial success in a niche, the technology can evolve to attract more users. PHP and Javascript  evolved later to fix their initial design flaws. They are both now mature object-oriented languages.

The price for fixing initial flaws is however extraordinary high. Once a language feature is designed and made available, it’s cast in stone. Evolving a language while maintaining backward compatibility is extremelly challenging, but breaking compatibility and dealing with multiple branches isn’t much of an easy solution neither. Notorious examples of evolutions in Java are the Java Memory Model and generics. It took years of research to plan them, and years of availability to reeducate the community. C++ is still trying to catch up, and still lacks feature that we take for granted on some plateforms, e.g. a standardized serialization.

Surprisingly, when adoption happens, it might be from a difference audience than the one expected.  “Languages designed for limited or local use can win a broad clientele, sometimes in environments and for applications that their designers never dreamed of.”  say the authors of Mastermind of programming. This is definitively true. Java was initially designed for embedded systems, but succeed instead in the enterprise. Javascript was thought as a thin veneer for web pages, but now powers the new generation of client-side web app and is even expanding to the server-side.

When a technology starts to decline after the adoption peak, don’t be too quick to claim it dead. It might enjoy an unexpected renaissance. Many have for instance claimed that Java was dead. They have failed however to recognize the the underlying JVM is a rocking beast, amazingly fast and versatile for those who know how to tame it. Nowadays, one of the best strategy to implement new languages is to leverage the JVM and provide interoperability with Java libraries. In turn, this massive adoption of the JVM by new language implementors is driving innovation in the JVM itself, which has been extended with new bytecode for languages other than Java. I doubt that James Gosling had anticipated this evolution of the platform.

Clearly, the key characteristics of a language are its syntax and semantics, since they define together its expressive power. Expressivity isn’t however the unique force at play for adoption. What a real-world check suggests is that expressivity is only one factor amongst many other technical and social factors. The ease of debugging or the existence of a friendly community could for instance turn out to more important for some users than the ease of writing code. Language designers typically understimate such factors, severly impeding their chances of success.

To foster adoption, one must also rekon that people are reluctant to change. What people are already familiar with must be taken in consideration: in 2008, mobile users wouldn’t have been ready for the minimalistic iOS 7 interface. They were however ready for the original skeuomorphic interface, and now that they have become familiar with it, they can get rid of the skeuomorphic ornaments. People don’t change for the sake of changing, they change to solve or problem, and they change only if the gain outweight the pain. For programming languages, the problem is productivity and the pain is learning a new platform. In 2013, developers might not be familiar enough with functional programming to adopt a pure functional language like Haskell, but they definitively are ready to adopt a hybrid language like Scala.

Together, these elements might help explain the failure of some great languages, for instance Lisp. Lisp is a beautiful programming language that offers amazing flexibility. For a skilled practitionner, lisp is a secret weapon. However, lisp does nothing particularly well out of the box. “Lisp isn’t a language, it’s a building material.”, dixit Alan Kay. Clojure, on the other hand, is a Lisp dialect with just enough direction to solve one very painful problem: writting concurrent code. Given that the problem is so painful, people won’t mind a few parenthesis to solve it. This choice paid off, and in 2012 Clojure moved in the “adopt” quadrant of Thoughwork’s technology radar.

The language business is a competitive business where idealism won’t prevail. For a language to be adopted, it must solve a problem for some early adopters, who will then create an attractive ecosystem that will convince the late majority. In other words: language designers should think of their language like a start-up.

Thinking in Lifecycle

After having worked hard on a design I came once to a colleague to ask him his opinion about it. “Do you think I’ve decomposed the problem in the right way?” I said. He look at it and answered “You are mixing elements with different lifecycles”. I came back to my desk confused and worried.

What I understood later is that  every element in the system always has a lifecycle, be it objects in memory, data in the database, business logic in the code, build artifacts, design documents, or configuration properties. Too many times, the lifecycles are poorly understood or poorly managed. Elements with different lifecycles need to be able to change independently of each other’s. Your design should reflect these lifecycles.

Thinking in terms of lifecycle is a very powerfull design tool. Surprisingly, it isn’t particularly emphazised in the literature about software design and architecture. The closest would be the concepts of coupling and cohesion, which capture how changes ripple in the system. The single responsibility principle is also close, when formulated as “A class should have only one reason to change“.

I’m happy my colleague made me aware of this particular way of thinking and that I could add this design tool to my intelectual toolbox. It’s proved very valuable since then.

Debunking Object-Orientation

What is an Object? What is the essence of the object pardigm? How do objects differ from other abstractions? What are their benefits? What are their pitfalls? Can we encode objects with lower building blocks? Should we have objects all the way down?

Some people think they are clever to observe that OOP has no formal, universal definition. Democracy, love and intelligence don’t either. — Tweet from Allain de Boton

Some thoughts:

And for the haters:

The Cost of Volatile

Assessing the scalability of programs and algorithms on multicore is critical. There is an important literature on locks and locking schemes, but the exact cost of volatile is less clear.

For the software composition seminar, Stefan Nüsch and I decided to look at this topic. Essentially, we devised a benchmark where multiple threads would access objects within a pool. Each thread has a set of objects to work with. To generate contention, the sets could be fully disjointed, have partial overlap, of have a full overlap. The ratio of reads and writes per thread was also configurable.

On an AMD 64 Dual Core, the graph looks as follows:

bench_amd64On a i7 Quad Core, the graph looks as follows:

bench_i7We clearly see that different architectures have different performance profiles.

In future work, we could try to reproduce false sharing and assess the impact of other forms of data locality.

More details about the benchmark and methodology can be found in his presentation. The code in on github.

Here a some links about the semantics of volatile, and mememory management in general.

Natural Queries

Programming language research is a quest for expressivity. The aim it enable the expression of complex computations in concise and intuitive ways.

The problem is that conciseness and intuitivity are usually conflicting. Expressivity is enabled by higher-order constructs which are hard to reason about. On one hand, functional languages enable the expression of complex computations concisely, but their intuitivity is low. Integrated query languages can help, though. On the other hand, imperative languages are very intuitive, but their expressivity is low.

Let’s assume you have time series. A sentence like “pick the average of values by intervals of 5 seconds” describes a non-trivial computation. This sentence in a natural language is both intuitive and concise. Dealing with such natural queries is challenging, but feasible. This is what facebook does. An important aspect of the features is that the system gives a feedback to the user about how it understood the query. The user can then rephrase or refine the query.

The same could be integrated into development environments. Instead of expressing computations with functions, developers would first give a shot at a natural description. It would be translated into functional code by the environment. Sample data would be generated to exemplify the query, serving both as immediate feedback for the correctness of the query, and later for unit testing. If the generated code is incorrect, it is refined by the developer. The icing on the cake: the original natural query is kept as a code comment.

Debuggers: Theory and Practice

In “Not on the shelves“, Greg Wilson writes reviews about non-existing books, including one called “Debuggers: Theory and Practice”. It’s an entertaining and insightful read.

It made me curious about how such a book could look like and I gathered a selection of interesting articles about debugging and tried to organize them in a few sections. I wish I were more knowledable on the topic and had enough references to split section III into one section “exploring executions” and another one “strategies for debugging”.

Part I: Foundation

Chapter 1: Failures, Errors, Faults
Chapter 2: What is Debugging?
Chapter 3: Debuggers are Reflective Programs

Part II: Reproducing Bugs (Reproducing Failures)

Chapter 1: Every bug is a test not yet written
Chapter 2: Record and Replay
Chapter 3: Execution Synthesis
Chapter 4: Reproducing Crashes with Recrash

Part III: Fixing Bugs (Identifying Faults)

Chapter 1: The Art of System.out
Chapter 2: Asking Why Questions with the Whyline
Chapter 3: Scriptable Time-Travel Debugging with First-Class Trace
Chapter 4: Back-in-Time Alias Debugging
Chapter 5: Object-centric Debugger
Chapter 6: Debugging Concurrent Programs

Why Smalltalk?

This is a question is occasionally asked by students and here is the answer.

We are not religious. This choice is not dogmatic. We do both research in programming languages and tool support for software evolution. In both cases Smalltalk is handy:

  • Programming language — Smalltalk is extermely uniform. Experimenting with a language change is faster in Smalltak, than say, Java or Ruby. Their syntax and sets of rules are bigger, which implies more work.
  • Tool Support — If you want to extend the environment with more browsers/views/features, you can do it easily. Also, tool support and meta-programming go well together. You don’t have a separation between application code and environment code. This makes the whole system very malleable to experiement with.

There exist other research platforms out there to ease experimentations in either category. They don’t match however with the versatility of Smalltalk, which remains thus a very competitive choice to consider. Usually, we mature our project to Java or Eclipse only after initial success in Smalltalk.

Smalltalk Anthology