Software engineering has always been about raising the level of abstraction.
We started with assembly language, then moved to procedural programming (Pascal), followed by structured programming (C). Object-oriented programming (C++) introduced encapsulation, while managed memory (Smalltalk) improved reliability. Later, portability became a focus with language runtimes like Java and C#.
This progression has been captured through the concept of programming language generations—1st, 2nd, 3rd, 4th, and 5th generations. The 4th generation was envisioned as a major shift: instead of manually managing implementation details like data structures and persistence, developers would write high-level specifications, and the language or environment would handle the rest. The 5th generation would rely on automated problem-solving. However, despite various attempts, mainstream programming remains at the 3rd generation, and 4th- or 5th-generation programming has yet to materialize—until now.
Large language models (LLMs) might be the first real step toward achieving this vision. For the first time, we can provide human-level specifications and let an AI generate working code. LLMs attempt to resolve ambiguities and infer intent—something they do surprisingly well. Interestingly, generating code that strictly adheres to a precise specification might turn to become the challenge.
Right now, with tools like Copilot, we use LLMs “in the small” to generate local sections of code. But what happens when we use them “in the large”—to generate most of a system? We may soon be able to integrate architecture documentation, business specifications, and UX mockups to produce functional software, but how such documentation should be structured remains an open question. As a colleague once put it, we will need an entirely new “theory of software documentation” for the age of LLMs.
This shift also echoes some of the ideas behind literate programming, introduced by Donald Knuth, which aimed to make code more readable by structuring it like natural language text. Throughout the history of software engineering, the primary medium for expressing and evolving programs has been text. LLMs, trained on vast amounts of textual data, take this evolution to its next logical step, blurring the lines between documentation and implementation.
Between small-scale and large-scale code generation, one particularly interesting application of LLMs is refactoring. Automating local modifications across large codebases has traditionally been handled by tools like OpenRewrite. However, writing precise transformation rules is tedious. Given LLMs’ ability to extrapolate from examples, they seem promising in the area.
With large-scale code generation, it’s still unclear how much of our code will be AI-generated in the future, and how much will require manual intervention. Also, it’s unclear how we will we manage and distinguish between the two. Traditional methods rely on inheritance or annotations to link generated and manually written code. LLMs, however, are not bound by these constraints. They can generate, rework, or extend code seamlessly. This isn’t just a technical problem but a methodological one.
The foundations—both technical and methodological—for engineering software with LLMs are still being developed. The best way forward is through experimentation and collective learning. The software industry is already embarking on this journey, and it’s exciting to be part of such a profound shift.