ewernli

Will AI Kill Agile Software Development?

Will AI kill agile software development? Looking at Atlassian’s recent plunge suggests investors think the usual agile setup—JIRA, stories, sprint boards—could face a big disruption. That makes many wonder if the old way of working with stories and features will disappear.

At its core, agile is about evolving requirements iteratively. Stories and features aren’t the real requirements, they’re just a way for people to organise work. Even if machines write the code, we still need a clear set of goals. Those goals will still exist, but they might live somewhere else.

One option is to keep requirements inside the codebase itself, either as additional artifact or in the form of tests. Big or regulated projects will still need dedicated requirement‑management tools (for example, Polarion) to keep versions, approvals and audit trails. Those tools will keep feeding AI the context it needs while providing the governance many companies require.

AI can actually help the agile process. A product owner can give a high‑level feature description to an AI, and the AI can break it into small implementation steps (stories), write acceptance criteria and draft a first version of the code. The backlog stays, but the heavy lifting of splitting work is done by the machine, possibly asking for clarification interactively.

Whether the implementation steps will be persisted if the AI does the work and not a human will depend on our needs of observability. Since it doesn’t cost much to persist the implementation steps, I guess we will keep storing them to track progress and flag problems.

The three main parts of software delivery—requirement management, work organisation and development environments—will still be separate, but AI will tie them together more tightly. Requirement‑management platforms will keep governance, work‑organisation tools will continue to help teams prioritise and visualise work, and IDEs such as IntelliJ will still provide debugging, testing and refactoring, now with extra AI‑generated code suggestions.

Overall, AI is unlikely to kill agile — at least not yet. Instead, it will change where requirements live, how stories are created and how tools interact with each other.

Thinking and Retrieval in the AI Stack

The modern AI stack can be viewed as three logical layers: Compute, Model, and Agent. While thinking (reasoning) and retrieval (fetching external information) may span multiple layers, separating them helps us reason about trade‑offs such as latency, observability, and cost.

Layer	Primary Function	Typical Primitive
Compute	Executes the heavy‑weight operations that power the stack.	GPU/TPU kernels for transformer forward passes; ANN‑search kernels; CPU‑based HTTP calls to external services.
Model	Performs core reasoning and, optionally, internal retrieval.	• Native chain‑of‑thought – the model generates step‑by‑step reasoning within a single forward pass (e.g., “Let’s think step‑by‑step…”). • Built‑in retriever – the model invokes a search tool (e.g., GPT‑4o browsing, Claude “search”, Gemini grounding) and conditions its output on the returned snippets.
Agent	Orchestrates complex workflows, decides when to call the model, and handles external data sources.	• Agent‑orchestrated reasoning – the agent decomposes a problem, builds prompts, may run meta‑reasoning loops, and determines when to invoke the model again. • External retrieval – the agent queries a vector store, a web‑search API, or any custom data source, then injects the retrieved passages into the next model prompt.

Whether thinking or retrieval happens in Model vs. Agent has some implications.

Dimension	Thinking – Model	Thinking – Agent	Retrieval – Model	Retrieval – Agent
Latency	One forward pass → minimal overhead (unless the model also does internal search).	Multiple orchestrated calls → higher latency, but sub‑tasks can run in parallel.	Single endpoint (e.g., `POST /v1/chat/completions` with built‑in tool) → low latency.	Two‑step flow (search → prompt → model) → added round‑trip time, but can parallelise search with other work.
Control / Policy	Model decides autonomously when to fetch external data → harder to audit or enforce policies.	Agent mediates every external call → straightforward throttling, redaction, logging, and policy enforcement.	Retrieval baked into the model → policy changes require a new model version.	Agent can enforce dynamic policies (rate limits, content filters) on each external request.
Resource Use	GPU must handle both inference and any ANN‑search kernels; higher compute density.	Retrieval can be off‑loaded to cheaper CPUs or dedicated search services; GPU used mainly for inference.	GPU handles only inference; no extra search kernels needed.	CPU or specialised search services handle retrieval, freeing GPU capacity for inference.
Observability	Reasoning is embedded in the token stream → debugging is indirect; limited visibility.	Agent logs each sub‑task, providing a clear, structured trace of why and when calls were made.	Limited visibility beyond token usage; retrieval is opaque to the caller.	Agent records search queries, responses, and any filtering applied, giving end‑to‑end traceability.

If you are building an agentic system, you’ll need to decide which responsibilities belong to the model and which to the agent.

If you are merely a user of such a system, the distinction is mostly invisible, showing up only as differences in answer quality, latency, and cost.

How Spec-Driven Development Could Play Out

In Climbing the Abstraction Ladder with LLMs, I shared some early thoughts on how large language models might push software development toward a more spec-driven approach. With the rise of agentic AI, this shift is no longer theoretical. We are starting to see it take shape in practice, often under the label spec-driven development (SDD).

If we think of SDD as a new paradigm—on par with object-oriented or functional programming—we can look at the past to try to anticipate the future. New tools and platforms will emerge that are designed specifically for this way of working, much like Smalltalk or Lisp were for earlier paradigms. Early adopters will build real systems with them, and some of those systems will look very impressive.

What will take much longer is understanding how well SDD holds up over the entire software lifecycle. The long-term impact of a paradigm, especially on maintainability and evolution, is hard to judge early on. Organizations can reasonably try SDD on non-critical projects, but for critical systems the risks are still significant. Too many questions remain unanswered.

History also suggests some caution. If OOP and FP are any indication, a new paradigm rarely succeeds in its pure form. What we call OOP today is a pragmatic version that still contains a lot of procedural code. Functional programming entered the mainstream in an even weaker form—immutability and lambdas, but rarely full-blown higher-order functional design. SDD will likely follow the same path. The version that survives in industry probably won’t be fully spec-driven.

As usual, non-functional requirements are where things get complicated. New paradigms tend to work well for simple cases, but persistence made OOP messy, and performance constraints often make FP hard to apply in practice. There is no reason to expect SDD to be different. Once non-functional requirements dominate, architectural decisions become central—and that’s where abstractions start to leak.

It is worth considering some critiques of this view. First, SDD may not be a full paradigm but rather a technique layered on top of existing paradigms, in which case expecting a long evolution from “pure” to pragmatic may be misleading. Second, non-functional requirements might be incorporated into the specs themselves, changing the traditional failure points. Finally, architecture might become more of a search or simulation problem for agents, rather than a human-only domain, which could make SDD more effective than expected. These critiques suggest that SDD’s trajectory could be very different from past paradigms, and that its eventual form may be more disruptive than a cautious hybrid would imply.

Assuming these critiques don’t hold, SDD should not be seen as a replacement for software design, but as a new abstraction that shifts where effort is spent. Over time, the industry will likely settle on a hybrid approach: partly spec-driven, partly traditional, and pragmatic above all. As with earlier paradigm shifts, success will come from understanding both the power and the limits of the abstraction.

Alternatives to US Tech Exist

Since the beginning of the new Trump administration, anxiety about U.S. tech dominance has been on the rise. Over the past few decades Europe’s position as a technology leader has slipped, and most of the continent’s biggest corporations now operate outside the core tech arena. Today the two unmistakable European tech giants are ASML and SAP. The internet revolution was largely handed over to American firms, with perhaps the lone notable exception of Spotify. Even though talent is abundant across Europe, many promising entrepreneurs choose to relocate to the United States. Still, a number of home‑grown companies are showing real promise; examples include Mistral, a cutting‑edge AI startup, and OV Cloud, a cloud‑infrastructure provider. If Europe wants to stay relevant on the global stage, it must rebuild a vibrant technology ecosystem.

One practical way we can contribute is by adopting local alternatives whenever possible. Switching costs for many services are surprisingly low—for instance, moving away from the Google Workspace suite can be done with little effort (except maybe for the email address). Proton.me is a compelling alternative. I plan to transition all of my non‑email workflows to Proton’s suite, and if the experience proves solid I’ll eventually move my email there as well. There’s much to like about Proton: The company was founded by scientists, which gives it a research‑driven, privacy‑first mindset. Its security‑focused architecture protects data from the ground up. Strong branding adds to its appeal. The name “Proton” is well chosen, and the Lumo mascot (the cat) whimsically recalls the vibe of games like Monument Valley.

A sizable portion of Europe’s tech talent already works for U.S. firms that have European offices—think Google in Zürich or Microsoft in Dublin. If a truly competitive European tech sector emerges, offering salaries and growth prospects on par with Silicon Valley, those professionals could switch quickly. The bottleneck isn’t a lack of talent; it’s the absence of European companies that appeal to this talent. That could change dramatically once the sector gains momentum.

Europe still possesses the expertise, the research institutions, and the entrepreneurial spirit needed to compete globally. Europe has a fragmented market and more regulations than the US, which is a structural disadvantage. However, a new structural advantage may be emerging: greater predictability and stability.

There may also be an opportunity to rethink services and product for AI. Doing this on existing mature products may be harder than on new products. The ideal outcome of a European Tech Renaissance would be new, AI‑first products built from the ground up.

Don’t Jump to the Solution

We have all been faced with someone jumping to conclusions. Less obvious, but just as common, is jumping to solutions.

You jump to conclusions when you overlook facts and alternative explanations. You jump to solutions when you overlook the problem itself and the many ways it could be approached. Both come from the same impulse to move fast and feel done.

Jumping to a solution can be costly. There is the obvious opportunity cost. Are you solving the most valuable problem? Is there even a real problem there? On top of that, the solution you rushed into might be the wrong one, or a decent one that crowds out a much better option you never explored.

You also miss a lot of the fun. Exploring the problem space is often the most interesting part of the work. That is where insights appear and where you actually learn something. If you jump too fast, you skip that phase entirely and turn the work into execution only.

Finally, you risk losing buy in. When you arrive with a fully formed solution, especially if you did it alone, people rarely feel ownership. At best they disengage. At worst they push back. Most people want a voice in how problems are framed and solved.

This is a lesson I keep relearning. Bring problems, not solutions.

Go Chronological

For many years I tried to come up with a clever system to organize my digital life.

I explored systems with folder structures, categories, labels, and nomenclature. Now, I’m convinced that the simplest and most effective approach is also the least ambitious one. Go chronological.

Forget about building complex structures that should work now and in ten years. Just store everything by year. Inside each year, create a handful of buckets using whatever grouping makes sense at the time. Taxes, pictures, projects, information, things that matter at the moment. Usually you end up with five to twenty buckets.

Life changes, and this approach deals with that naturally. You do not need a future proof system. Each year stands on its own and can use its own categories. Time provides the only stable axis.

At first, retrieval sounds inefficient. In practice it works surprisingly well. Most of the time you know the year. Once you have that, finding the right bucket is either straightforward or a matter of a few clicks. Within one year, misclassification isn’t an issue. Even if you need to check a couple of nearby years and the buckets look slightly different, it is still no big deal. The approach trades classification purity with simplicity.

Searching across many years can be less convenient, but this happens rarely. And when it does, full text search exists.

Archiving is trivial. A year is done, you close it, and you move on.

Truly long lasting projects can live outside of this structure, but they should be the exception. Recurring administrative topics do not qualify.

There is also something quietly profound in all of this. Even in a long career, you will likely end up with sixty to eighty yearly folders. That is a small enough number to manage without much effort. Your life is actually short.

With 2026 around the corner, it is time to create a new folder and start bucketing again.

Mentoring AI

Working with GitHub Copilot to develop software, I was struck by how surprisingly human AI can feel.

When you give Copilot a task, it does not produce a perfect answer in one step. It makes a plan, follows it, checks its own work, notices mistakes, and tries again. This loop of planning, acting, evaluating, and adjusting is the same way humans work.¹

For a long time we imagined AI as something that would make no mistakes. Early hallucinations challenged that idea. But with agent-style workflows, the problem becomes manageable in the same way it is with humans. We create checks for correctness, break big problems into smaller pieces, and work around limited memory or context.

We also like to think that humans reason from first principles. In reality, we mostly reuse ideas we have already heard. AI works in a similar way.

The main differences are speed, endurance, and focus. AI does not get tired or distracted.

Working with AI agents also feels similar to delegating work to coworkers. First you make sure you both understand the task. Then you set guardrails so things do not go in the wrong direction. You do not want to micromanage, but you also do not want to discover too late that everything has drifted off course. If you have ever delegated work to a junior colleague, you already have an advantage when working with AI.

In fact, working with AI is teaching techies a new skill: mentoring. What was once a soft skill is now a hard skill.

The unsettling part will come when AI is no longer the junior partner. When Copilot starts taking real initiative and becomes your mentor, what will that look like?

More

https://www.oneusefulthing.org/p/three-years-from-gpt-3-to-gemini?

In Promoting AI Agents, DHH referst to “supervised collaboration” – I’m fine with this wording too

https://www.oneusefulthing.org/p/management-as-ai-superpower

Note that it’s not that surprising. The agent mode was designed like this by human. The loop isn’t an emerging property of LLM. ↩︎

SAFe: What’s a System Architect?

After five years in the role and experience with two different ARTs, it feels like a good time to reflect on what the System Architect role really means in SAFe.

The main responsibility of the System Architect is to align teams around a shared architectural vision and to plan the necessary architecture work together with the teams through enablers.

In SAFe, the Product Manager acts more like a project manager who plans the work but doesn’t own the roadmap content. Business features and epics usually come from the business side. The System Architect, on the other hand, owns the architectural enablers and works closely with the PM and POs to plan their realization.

Architecture itself is a large and multifaceted area. I find the definitions of enablers and the architectural runway somewhat unclear. Platforms are clearly enablers, but it’s less obvious whether something like an interface qualifies as one. The term “architecture runway” is essentially another way to describe architectural foundations. In practice, architectural work often relates to technology—such as platforms, services, and cross-cutting concerns—but it can also include non-technical elements, like core data models.

In modern architecture, application architecture and application delivery are tightly connected. SAFe includes the promotion of DevOps practices as part of the architect’s work. The framework does not prescribe specific architectural artefacts, so documents like system architecture descriptions, quality reports, or technical debt tracking sheets must be shaped by the context and the needs of the ART.

While the System Architect is responsible for communicating the architectural vision and owning the enablers, the way this work is carried out can vary a lot. The architect may act as a facilitator who brings people together to clarify architectural needs and focuses on managing the architectural effort. The architect may also take a designer role, setting a clear direction and focusing on the overall system design. In both cases, during implementation the architect may work closely with the teams as a partner or delegate the work fully, depending on the situation.

Unlike the PM and RTE, who have clear counterparts at the team level (PO and Scrum Master), the System Architect does not. In Scrum, the entire development team shares responsibility for architecture, which can make alignment and collaboration more challenging.

Part of the System Architect’s work is to identify the right challenges for the ART and address them in the right way. These challenges can look very different depending on how the ART is set up. If several teams work on a single product, there are shared architectural foundations that cut across teams, and coordinating changes becomes essential. If the ART has teams working on separate products but using shared platforms, alignment is still needed but in a more subtle way, since teams have more freedom in how they work. If teams build independent products on different platforms, the main challenge becomes managing that diversity and keeping the overall architecture coherent where it matters.

What success means for a System Architect depends on the product portfolio, the IT strategy, the architect’s skills, the teams’ maturity, and the culture within the ART. But in every case, success comes from shaping the architecture in a way that helps the ART deliver real value, not from following a fixed recipe.

The Great AI Buildout

The ongoing AI buildout has similarities with the railroad expansion of the 20th century. Both are capital intensive undertakings with the potential to reshape the entire economy. Just as railroads transformed how we navigate physical space, AI is poised to transform how we navigate the information space.¹ It’s obvious that railroads were useful and AI is no different.

During the railway boom, railroads proliferated amid intense competition. Overcapacity was common, some companies went bankrupt, and the industry took years to consolidate. Eventually, railroads became commoditized.

The same dynamics may play out with AI. Semiconductors and datacenters are the tracks and rolling stock. AI applications are the railway companies operating the lines. The coming years will reveal which segments of the AI ecosystem are truly profitable.

At the peak of the railroad era, rail companies accounted for roughly 60 percent of market capitalization. Today, AI makes up about 30 percent of the stock market. Such valuations are only justifiable if AI adoption becomes widespread. For semiconductors and datacenters, this means continuing infrastructure buildout. For AI applications, this means acquiring enough users to finance that growth.

The investment in AI is enormous—around $220 billion per year. But it does not need to replace all labor to be justified. Global labor is about $60 trillion per year, and information work accounts for roughly 10–20 percent of that. By this math, AI only needs to replace 1.8 to 3.7 percent of information work per year to pay off the investment.

At the individual level, that is about one or two days of work saved per information worker per year. With AI agents, improving information work—searching, aggregating, writing, and generating information—is already within reach. This means the current investment is economically justified even if AI only captures a small portion of information work.

More

https://www.ubs.com/global/en/wealthmanagement/insights/chief-investment-office/250-years-of-us-innovation.html#railroad

https://unchartedterritories.tomaspueyo.com/p/is-there-an-ai-bubble

https://www.allianzgi.com/en/insights/outlook-and-commentary/is-ai-the-new-railroad

The metaphor is not as stretched as it seems. Large language models literally encode information in multi-dimensional vector spaces, computing distances between vectors to find similarities. ↩︎

Working with AI – Second Experiment

My first “experiment” is now 1.5 years old, which feels like a lifetime in AI.

To get a sense of where we are with Copilot today, I decided to revisit that project. The goal was to upgrade the one-page web app into a proper Angular application using TypeScript.

I also took on the “Copilot challenge“: use only Copilot – No manual edits.

Here’s what I learned this time:

Refactoring with AI works
I was able to split code into Angular components, convert interactions to RxJs, move logic around, add proper typing, convert promises to async functions, extract services, remove dead code, and clean up naming. Copilot handled all of this well.

It handles technical code reliably
Parsing data, building user interfaces, caching with local storage, customizing chart behavior, adding compression: Copilot handles this also very well. It got confused with API behavior across chart library versions, but a real developer might too as well. It made UI iteration fast and smooth.

Agent mode is far better than edit mode
Edit mode often produced code that wouldn’t compile or had broken imports. Agent mode fixed those issues automatically. Not having to think about the context that much is also a relief. Using only edit mode first helped me see how much better agent mode is for real-world use.

Feels like working with a junior developer
Copilot gets things done, but may take shortcuts. Sometimes it ties logic too closely to rendering or makes structural choices that aren’t ideal. It helps, but you still need to guide it.

The code quality is generally good
Its output is usually clean, readable, and idiomatic. Not always how I’d write it, but solid. It consistently handles edge cases like null values. Over time, though, consistency degrades. One feature might follow one style, another a different one. You still need to set and enforce coding standards. Comments appear inconsistently, sometimes helpful, sometimes missing.

Mixed experience with CSS styling
Copilot is good at suggesting layout ideas, but maintaining a consistent visual style across the app was difficult.

It can write basic business logic
If your specification is clear and specific, it can generate useful logic. But for the code to match your expectation, you need to put the effort to give a precise specification. I didn’t investigate the generation of the test accordingly, this is for sure a subject to investigate further.

Temporal data types remains a weak spot
Handling dates was frustrating. I started by converting strings to date objects, thinking it would be more robust. But JavaScript’s Date isn’t well suited for this. Copilot didn’t flag the issue. Only later, when I asked directly, did it suggest sticking with strings. It often confused timestamps and date objects.

Data manipulation is a strong point
Tasks like changing the structure of JSON files or merging them worked well. No major issues here.

Copilot enables much faster iteration, significantly lowering the cost of programming and shifting the work towards software design only. The improved reliability of the agent mode compared to the edit mode provides a major cognitive relief. Iterating through chat, or even “negotiating” a solution before asking Copilot to implement it, feels fundamentally different from classic development.

Programming is an activity that tax your short-term memory. Usually, if I have less than half an hour, I won’t engage in programming. It’s usually too short to switch context and produce some working code. Something interesting happened during this second experiment: even if I had on 15-20 min, I could quickly try out a new idea with copilot.

There’s no question that it’s a more productive way to work.