Will AI Kill Agile Software Development?

Will AI kill agile software development? Looking at Atlassian’s recent plunge suggests investors think the usual agile setup—JIRA, stories, sprint boards—could face a big disruption. That makes many wonder if the old way of working with stories and features will disappear.

At its core, agile is about evolving requirements iteratively. Stories and features aren’t the real requirements, they’re just a way for people to organise work. Even if machines write the code, we still need a clear set of goals. Those goals will still exist, but they might live somewhere else.

One option is to keep requirements inside the codebase itself, either as additional artifact or in the form of tests. Big or regulated projects will still need dedicated requirement‑management tools (for example, Polarion) to keep versions, approvals and audit trails. Those tools will keep feeding AI the context it needs while providing the governance many companies require.

AI can actually help the agile process. A product owner can give a high‑level feature description to an AI, and the AI can break it into small implementation steps (stories), write acceptance criteria and draft a first version of the code. The backlog stays, but the heavy lifting of splitting work is done by the machine, possibly asking for clarification interactively.

Whether the implementation steps will be persisted if the AI does the work and not a human will depend on our needs of observability. Since it doesn’t cost much to persist the implementation steps, I guess we will keep storing them to track progress and flag problems.

The three main parts of software delivery—requirement management, work organisation and development environments—will still be separate, but AI will tie them together more tightly. Requirement‑management platforms will keep governance, work‑organisation tools will continue to help teams prioritise and visualise work, and IDEs such as IntelliJ will still provide debugging, testing and refactoring, now with extra AI‑generated code suggestions.

Overall, AI is unlikely to kill agile — at least not yet. Instead, it will change where requirements live, how stories are created and how tools interact with each other.

Thinking and Retrieval in the AI Stack

The modern AI stack can be viewed as three logical layers: Compute, Model, and Agent. While thinking (reasoning) and retrieval (fetching external information) may span multiple layers, separating them helps us reason about trade‑offs such as latency, observability, and cost.

LayerPrimary FunctionTypical Primitive
ComputeExecutes the heavy‑weight operations that power the stack.GPU/TPU kernels for transformer forward passes; ANN‑search kernels; CPU‑based HTTP calls to external services.
ModelPerforms core reasoning and, optionally, internal retrieval.Native chain‑of‑thought – the model generates step‑by‑step reasoning within a single forward pass (e.g., “Let’s think step‑by‑step…”).
Built‑in retriever – the model invokes a search tool (e.g., GPT‑4o browsing, Claude “search”, Gemini grounding) and conditions its output on the returned snippets.
AgentOrchestrates complex workflows, decides when to call the model, and handles external data sources.Agent‑orchestrated reasoning – the agent decomposes a problem, builds prompts, may run meta‑reasoning loops, and determines when to invoke the model again.
External retrieval – the agent queries a vector store, a web‑search API, or any custom data source, then injects the retrieved passages into the next model prompt.

Whether thinking or retrieval happens in Model vs. Agent has some implications.

DimensionThinking – ModelThinking – AgentRetrieval – ModelRetrieval – Agent
LatencyOne forward pass → minimal overhead (unless the model also does internal search).Multiple orchestrated calls → higher latency, but sub‑tasks can run in parallel.Single endpoint (e.g., POST /v1/chat/completions with built‑in tool) → low latency.Two‑step flow (search → prompt → model) → added round‑trip time, but can parallelise search with other work.
Control / PolicyModel decides autonomously when to fetch external data → harder to audit or enforce policies.Agent mediates every external call → straightforward throttling, redaction, logging, and policy enforcement.Retrieval baked into the model → policy changes require a new model version.Agent can enforce dynamic policies (rate limits, content filters) on each external request.
Resource UseGPU must handle both inference and any ANN‑search kernels; higher compute density.Retrieval can be off‑loaded to cheaper CPUs or dedicated search services; GPU used mainly for inference.GPU handles only inference; no extra search kernels needed.CPU or specialised search services handle retrieval, freeing GPU capacity for inference.
ObservabilityReasoning is embedded in the token stream → debugging is indirect; limited visibility.Agent logs each sub‑task, providing a clear, structured trace of why and when calls were made.Limited visibility beyond token usage; retrieval is opaque to the caller.Agent records search queries, responses, and any filtering applied, giving end‑to‑end traceability.

If you are building an agentic system, you’ll need to decide which responsibilities belong to the model and which to the agent.


If you are merely a user of such a system, the distinction is mostly invisible, showing up only as differences in answer quality, latency, and cost.