How Spec-Driven Development Could Play Out

In Climbing the Abstraction Ladder with LLMs, I shared some early thoughts on how large language models might push software development toward a more spec-driven approach. With the rise of agentic AI, this shift is no longer theoretical. We are starting to see it take shape in practice, often under the label spec-driven development (SDD).

If we think of SDD as a new paradigm—on par with object-oriented or functional programming—we can look at the past to try to anticipate the future. New tools and platforms will emerge that are designed specifically for this way of working, much like Smalltalk or Lisp were for earlier paradigms. Early adopters will build real systems with them, and some of those systems will look very impressive.

What will take much longer is understanding how well SDD holds up over the entire software lifecycle. The long-term impact of a paradigm, especially on maintainability and evolution, is hard to judge early on. Organizations can reasonably try SDD on non-critical projects, but for critical systems the risks are still significant. Too many questions remain unanswered.

History also suggests some caution. If OOP and FP are any indication, a new paradigm rarely succeeds in its pure form. What we call OOP today is a pragmatic version that still contains a lot of procedural code. Functional programming entered the mainstream in an even weaker form—immutability and lambdas, but rarely full-blown higher-order functional design. SDD will likely follow the same path. The version that survives in industry probably won’t be fully spec-driven.

As usual, non-functional requirements are where things get complicated. New paradigms tend to work well for simple cases, but persistence made OOP messy, and performance constraints often make FP hard to apply in practice. There is no reason to expect SDD to be different. Once non-functional requirements dominate, architectural decisions become central—and that’s where abstractions start to leak.

It is worth considering some critiques of this view. First, SDD may not be a full paradigm but rather a technique layered on top of existing paradigms, in which case expecting a long evolution from “pure” to pragmatic may be misleading. Second, non-functional requirements might be incorporated into the specs themselves, changing the traditional failure points. Finally, architecture might become more of a search or simulation problem for agents, rather than a human-only domain, which could make SDD more effective than expected. These critiques suggest that SDD’s trajectory could be very different from past paradigms, and that its eventual form may be more disruptive than a cautious hybrid would imply.

Assuming these critiques don’t hold, SDD should not be seen as a replacement for software design, but as a new abstraction that shifts where effort is spent. Over time, the industry will likely settle on a hybrid approach: partly spec-driven, partly traditional, and pragmatic above all. As with earlier paradigm shifts, success will come from understanding both the power and the limits of the abstraction.

Alternatives to US Tech Exist

Since the beginning of the new Trump administration, anxiety about U.S. tech dominance has been on the rise. Over the past few decades Europe’s position as a technology leader has slipped, and most of the continent’s biggest corporations now operate outside the core tech arena. Today the two unmistakable European tech giants are ASML and SAP. The internet revolution was largely handed over to American firms, with perhaps the lone notable exception of Spotify. Even though talent is abundant across Europe, many promising entrepreneurs choose to relocate to the United States. Still, a number of home‑grown companies are showing real promise; examples include Mistral, a cutting‑edge AI startup, and OV Cloud, a cloud‑infrastructure provider. If Europe wants to stay relevant on the global stage, it must rebuild a vibrant technology ecosystem.

One practical way we can contribute is by adopting local alternatives whenever possible. Switching costs for many services are surprisingly low—for instance, moving away from the Google Workspace suite can be done with little effort (except maybe for the email address). Proton.me is a compelling alternative. I plan to transition all of my non‑email workflows to Proton’s suite, and if the experience proves solid I’ll eventually move my email there as well. There’s much to like about Proton: The company was founded by scientists, which gives it a research‑driven, privacy‑first mindset. Its security‑focused architecture protects data from the ground up. Strong branding adds to its appeal. The name “Proton” is well chosen, and the Lumo mascot (the cat) whimsically recalls the vibe of games like Monument Valley.

A sizable portion of Europe’s tech talent already works for U.S. firms that have European offices—think Google in Zürich or Microsoft in Dublin. If a truly competitive European tech sector emerges, offering salaries and growth prospects on par with Silicon Valley, those professionals could switch quickly. The bottleneck isn’t a lack of talent; it’s the absence of European companies that appeal to this talent. That could change dramatically once the sector gains momentum.

Europe still possesses the expertise, the research institutions, and the entrepreneurial spirit needed to compete globally. Europe has a fragmented market and more regulations than the US, which is a structural disadvantage. However, a new structural advantage may be emerging: greater predictability and stability.

There may also be an opportunity to rethink services and product for AI. Doing this on existing mature products may be harder than on new products. The ideal outcome of a European Tech Renaissance would be new, AI‑first products built from the ground up.

GDP: A Brief but Affectionate History

I’ve been reading the book GDP: A Brief but Affectionate History, and it’s been a great way to deepen my understanding of what GDP really measures—and, perhaps more importantly, what it doesn’t.

Gross Domestic Product (GDP) is the standard metric used to assess a country’s economic performance. It sums up the total value of all goods and services produced within a country over a specific time period. The most widely used formula is:
GDP = C + I + G + (E – M)
Here, C stands for consumption, I for investment, G for government spending, E for exports, and M for imports. This equation captures total demand for a nation’s output, whether from households, businesses, the government, or foreign buyers.

While GDP is a useful snapshot of economic activity, the book explores its many limitations. One way to understand these issues is through concrete examples.

Imagine you spent $100,000 in 2010 and $110,000 in 2020. Does that mean you consumed more, or did prices simply rise? In other words, how much of the increase reflects real growth, and how much is just inflation? Adjusting for inflation is relatively straightforward for basic goods like bread or gasoline, where price data is readily available. But for complex goods or services—like smartphones, healthcare, or education—it becomes much harder. How do you measure quality improvements or innovation? A phone that costs the same but does ten times more than it did a decade ago complicates the picture. This kind of change is a kind of “disinflation,” but it’s tricky to capture in the numbers.

Another challenge is the value of “invisible” services. If you clean your own house or use a free service like Google Search, your contribution isn’t included in GDP. But if you pay someone to clean, suddenly that activity becomes “economic output.” Even if you tried to include such contributions, how would you price them? Using market equivalents is problematic because gift economies operate under different dynamics than market economies.

More broadly, GDP struggles with distinctions between productive vs. unproductive activity, or cost vs. investment. For instance, in its early days, GDP didn’t include government spending—it was considered a cost, not output. Over time, certain types of spending, like software development, shifted from being recorded as a cost to being treated as an investment. Government services are especially tricky: we can measure how much they cost, but not easily value their outcomes, since they’re not sold on the market.

Then there’s the issue of consumption itself. Not all consumption improves welfare. Spending on heavily processed, unhealthy foods raises GDP but may also lead to long-term health costs. And GDP is silent on environmental degradation. Cutting down forests or burning fossil fuels adds to GDP in the short term, but may reduce the planet’s ability to support future prosperity.

On top of all these conceptual issues, there’s the practical challenge of data quality. GDP depends on large-scale surveys and statistical estimates. No matter how rigorous the methods, there’s always some uncertainty baked into the numbers.

These and other issues are explored in the book, which weaves together history, economics, and policy into an engaging narrative. In the end, understanding the limits of GDP helps us recognize that while it measures activity, it doesn’t necessarily measure progress or well-being. For that, we need to look beyond the numbers.

Talk: Inside SAFe Principle #1 (@SBB DevDay’24)

I gave a talk a the yearly SBB DevDay conference. It was about SAFe’s principle #1: Take an economic view. The main idea was to compare the economics of software development with the economics of manufacturing to better understand concepts like capex, opex, maintainance, debts.

Understanding ChatGPT

ChatGPT has surprised everyone. We now have systems that produce human-like texts. Without much fanfare, ChatGPT actually passed the Turing test.

While we don’t fully comprehend how and why large language models work so well, they do. Even if we don’t fully understand them, it’s worth building an intuition about how they function. This helps avoid misunderstandings about their capabilities.

In essence, ChatGPT is a system that learns from millions of texts to predict sentences. If you start a sentence, it tries to predict the next word. Taking the sentence augmented with one word, it tries again to predict the next word. This way, word after word, it can complete sentences or write whole paragraphs.

Interestingly, the best results are achieved when introducing randomness in the process. Instead of always selecting the most probable word each time, it’s best to sometimes pick alternatives with lower probabilities. It makes the sentences more interesting and less redundant.

What’s also kind of amazing is that this approach works to answer questions. If you start with a question, it tries to predict a reasonable answer.

Thinking of ChatGPT as a text predictor is useful, but it’s even more useful to think of it as a form of compression. When neural networks learn from examples, they try to identify regularities and extract recurring features. The learning is lossy: the neural network doesn’t remember the examples it was fed with exactly. But it remembers the key features of them. When ChatGPT generates text, it “interpolates” between the features.

Impressive examples of “interpolation” are prompts that mandate an answer “in the style of,” for instance, “in the style of a poem.” ChatGPT not only gives a coherent answer content-wise but also applies a given style.

But ChatGPT is, in essence, interpolating all the time. It’s like a clever student who didn’t study a topic for the course but has access to the course material during the exam. The student may copy-paste elements of the answer and tweak the text to sound plausible, without having any real understanding of the matter.

What ChatGPT shows us is that you can go very far without a true understanding of anything. And I believe that this applies to how we behave too. On many topics, we can discuss based on facts we heard, without actually understanding the underlying topic. Many people who sound smart are very good at regurgitating things they learned somewhere. They wouldn’t necessarily be particularly good at reasoning on a topic from first principles. To a certain degree, we conflate memory with intelligence.

At the same time, ChatGPT can do some reasoning, at least some simple one. It probably has extracted some feature that captures some logic. It works for simple things like basic arithmetic. But it fails when things become more complicated.

Fundamentally, when predicting the next word, ChatGPT is doing one pass of the neural network, which is actually one function. A pass of the neural network cannot do a proper computation that would involve, for instance, a loop. It fails the prompt “Starting with 50, double the number 4 times. Do not write intermediate text, only the end result.”, giving 400 back. But asked to write the intermediate steps, it computes correctly 800. You can help ChatGPT into multi-step computation by asking him to write the intermediate steps because then it will go through the neural net several times. This pattern is known as “chain of thought prompting.”

We don’t fully understand ChatGPT yet—how it works and what it really can do. But clearly, it can do more than we expected, and it will bring all kinds of exciting insights about cognition.

References:

What is Apple?

Apple has grown into a fascinating company with a diverse range of offerings. Initially known for its computers, it has expanded into various industries. With Apple being the most valuable company in the world according to its market capitalization, it’s worth asking: what is actually Apple now?

It’s a fashion company – Apple has become a fashion staple, with its accessories blending seamlessly into our daily lives. We don’t just use Apple products; we wear them (EarPod, iPhone).

It’s a luxury company – Apple’s products command a premium price tag, appealing to consumers seeking quality and prestige. Interestingly, both the affluent and the everyday consumer use Apple products.

It’s a technology company – At its core, Apple remains a technology powerhouse, continuously pushing the boundaries of innovation with advancements like the M1 chip or VisionPro, a testament to its ongoing commitment to cutting-edge design and functionality.

It’s an entertainment company – Venturing into entertainment with Apple TV and original content production, Apple has diversified its portfolio beyond hardware alone.

It’s a finance company – Apple has financial services like the Apple Store and Apple Pay, showing its ambition to capitalize on its robust market presence. Apple is becoming a bank more and more.

So basically, Apple does a bit of everything, which is pretty interesting!

While many of these products were launched under the leadership of Steve Jobs, others have been initiated under the stewardship of Tim Cook. Cook may not have spearheaded groundbreaking innovations like the iPhone, but his management has propelled Apple to new heights.

Renowned investor Warren Buffett’s strategy typically avoids technology investments due to their intense competition and difficulties in sustaining lasting competitive edges. Instead, he prefers investments in durable goods and services with robust brand loyalty and high switching costs. The evolution of Apple’s identity is noteworthy, particularly as Buffett began investing in the company. Today, Apple stock comprises a significant 70% of his portfolio, marking a notable departure from his traditional investment approach.

Apple’s story is unique and one thing remains certain: its ability to surprise, inspire, and redefine industries will continue to captivate audiences worldwide.

(Style improved with ChatGPT)

Neuromancer

Amidst all the excitement about AI and the metaverse, I recently decided to dive into Neuromancer, the classic sci-fi novel that sparked the cyberpunk genre.

I’d heard it was a big inspiration for The Matrix, so I was curious to see how they compared. While there are definitely similarities, like the matrix itself and the tough, leather-clad female character, the plot diverges distinctly.

Instead of focusing on freeing humanity from the matrix, the book revolves around jailbreaking an AI and merging it with another AI to create a superintelligence. It kind of reminded me of Transcendence, with its futuristic vibe similar to Blade Runner, which came out around the same time as the book.

The writing style is pretty unique, almost like poetry in places, and the story feels like a wild ride. It’s not the easiest read, but it captures well the crazy journey the characters are on.

One thing that stood out to me was how the book portrays the matrix/cyberspace—it’s abstract and undefined, somewhere between VR and a system for visualizing information, kind of like augmented reality today.

It’s also somewhat ironic is that despite its visionary themes, Neuromancer didn’t foresee wireless communication. The protagonist constantly “jacks in” and “jacks out”, relying on physical cables.

It’s pretty wild to think that this book was written back in 1982, considering it tackles themes like AI and the metaverse that are becoming such big topics in 2024. Apple released its first VR set, AI got mainstream, and discussions about the risks of AI are hotter than ever. Neuromancer’s foresight is pretty impressive, making it a classic worth revisiting.

(Blog post style improved with ChatGPT)

Scarcity is the Mother of Invention

The original proverb is “Necessity is the mother of invention.” But as we explore the ways we innovate, it’s clear that scarcity rather than necessity plays a big role in sparking creativity. Indeed, if you’re in need of something abundant, you won’t be innovative. It’s scarcity that prompts us to think differently and find new ways to solve problems.

Scarcity affects many parts of our lives: from time and labor to energy, food, and attention. Each scarcity challenges us to think creatively and come up with new solutions.

Time is something we all wish we had more of. Anything that helps us save time or use it better becomes really valuable. Tools like ChatGPT make communication and problem-solving faster and easier. And platforms with good content save us from wasting time on things we don’t enjoy.

When there aren’t enough people to do the work, organizations have to find ways to be more productive. Digitalization, for example, helps streamline processes and automate tasks. In fields like transportation – where I’m working – , automation helps deal with staffing shortages, like in traffic dispatching.

Other resources, like energy and attention, are also scarce because we only have so much time and focus to go around. Using energy efficiently saves us money and time, while managing attention effectively helps us stay focused on what’s important.

When resources are scarce, we naturally start looking for other options. We switch from human to machine labor, look for renewable energy sources, find sustainable food options, and use technology to help manage our attention better.

While necessity might kickstart our creativity, it’s scarcity that really pushes us to innovate.

(The style of the blog post has been improved with ChatGPT, of course)

Working with AI – A First Experiment

AI will be in an inevitable tool to use in the future. To get a first impression of how it is to work with AI, I decide to realized a very small project using ChatGPT as assistant.

The small project would be a webpage that charts the performance of a portfolio of stocks. I haven’t written webpages since a long time (15 years!), so I would have to catch up using ChatGPT. I also decided to explore AWS Lambda at the same time.

The architecture is very simple: The webpage is a static file and historic stock quotes are stored on AWS S3. There’s a lambda that fetches the stocks quotes every night and stores the output in S3. The computation of the portfolio is done on the client-side. The key to access the stocks API is therefore not public, and I also don’t need a real backend to serve data.

For charting, ChatGPT suggested Chart.js, which was fine. For the stock API, the suggestions of ChatGPT were less useful. I had to compare myself the various sites directly. Finally, I settled on marketstack. That’s the best free tier I could find. Unfortunately, it doesn’t provide an API for currency rate. For hosting, ChatGPT gave me handing hint: you can upload you static website on AWS S3 and make it publicly accessible.

With the help of ChatGPT, it took my a couple hours to build the first version of the webpage using Chart.js and pure javascript.

Key learnings:

  • AI productivity boost is real. ChatGPT is quite amazing. It can give good suggestions about technological options. The quality of the code is also surprisingly good. You need to double check the answers, but it provides a lot of good insights. Definitively a productivity boost.
  • Good onboarding experience helps win clients. There are many stocks API. The quality of the various stocks API differ a lot. Onboarding is a killer point for any technical product. I chose marketstack because it was the simplest option to get something working, even though I know it doesn’t have a currency API which I will need later on.
  • Domain knowledge is always an asset. As with most business domain, things are never as simple as they seem. Computing the performance of a portfolio seems a no brainer. But stocks can split and have dividends. Therefore, the nominal historic price is misleading for long-term historical analysis. Instead, APIs provide adjusted closing prices.
  • Designing framework APIs is an art. There are many charting libraries and the way they are designed differ a lot. This reminded my of Why a Calendar App is a Great Design Exercise. Designing a chart API would a great exercise, too.

As for the webpage, I see lots of way to improve it further. From the domain point of view, I could add support for comparison with various indexes. From the technical point of view, being able to edit the portfolio would be nice. Supporting several users with login would also be a nice experiment. Figuring out what a delivery pipeline for lambda look like would also be interesting. At the moment, it was all manual uploads to S3.

If I have enough time, I may continue the project with ChatGPT. For the technical points, ChatGPT helps a lot, and proved to be a valuable assistant.

GDP as Proxy for Progress

The gross domestic product (GDP) measures how much goods and services have been produced in a year. It measures the economic activity of a country and is used as a a proxy to track the standard of living. The higher the GDP per capita, the more goods and services are accessible to the population, the higher is the living standard.

The very good post “What is economic growth?” from ourworldindata.org makes GDP more tangible:

Have a look around yourself right now. Many of the things you see are products that were produced by someone so that you can use them: the trousers you are wearing, the device you are reading this on, the electricity that powers it, the furniture around you, the toilet that is nearby, the sewage system it is connected to, the bus or car or bicycle you took to get where you are, the food you had this morning, the medications you will receive when you get sick, every window in your home, every shirt in your wardrobe, and every book on your shelf.

Over time, the cost of good and service decrease, due to improvement in production. The utility of the good or service remains constant, though. Inversely, for a given price, the utility increases. An affordable car today is way more comfortable, secure, and efficient than a car from ten year ago for the same price. GDP and progress don’t correlate exactly then. A constant GDP could still represent a constant, modest progress. But in practices, progress is manifested by increased GDP per capita.

Using GDP as proxy for standard of living is subject to debates. Besides access to goods and services, standard of living also encompasses dimensions like access to education, access to nature, or access to health care. Different countries and social systems with similar GDP may fare differently on these points. As a crude measure for the whole country or per capita, GDP says little about the distribution. Inequality isn’t well captured by GDP.

The cousin of GDP to track standard of living is life expectancy. Life expectancy also aggregates and proxies several aspects like education, access to health care, or overall well-being. Interestingly, while both metrics usually correlate, there are discrepancies that remind us that the metrics or only proxies and have flaws.

Ourworldindata.org has an interactive chart to explore GDP and life expectancy:

The GDP is influenced by the household structures. Depending on how GDP is computed, it may or may not contain “services” provided by family members, such as cooking, child care, or taking care of the elderly.

On one hand, it makes sense to include personal activities in the GDP. If you grow your vegetable yourself, you’re working as a farmer with one consumer, yourself. You’re producing some good for yourself. One the other hand, it’s not economic activity – it’s not traded on a market. This activity is not on the market and will not “benefit” from market mechanisms (or rather be “driven” by market forces), such as market-driven specialisation, allocation of resources, or pricing.

In economic logic, growing yourself your vegetable isn’t rational, since you use time that you could have invested in some other more rewarding economic activity, depending on your profession. The same holds for child care: working part-time to raise children isn’t economically rational if you have a high-paying profession.

The reason people are willing to sacrifice profit for such activities at the moment is leisure and fulfillment. They find an increased happiness at some other level in these activities.

Leisure and fulfillment work at the personal level. Doing more outside of the formal economy also brings benefits to the economy itself: it increases resilience and sustainability. These values are not accounted in classic economy, but maybe should.

Given the flaws of GDP, it’s no surprise that other metrics have been developed to track development, for instance the human development index. But the simplicity of GDP (or life expectancy for that matter) is very attractive. GDP will remain the prevalent metric for the years to come.