The AI Force Multiplier: What DORA Actually Found


December 2024, the tenth annual DORA report shipped, and one finding broke containment. A 25% increase in AI adoption correlated with a 1.5% drop in delivery throughput, a 7.2% drop in stability, and a 2.6% drop in time spent on valuable work. Most coverage shipped one sentence (AI hurts software delivery) and moved on.

The same report found, on the same page, that documentation quality went up 7.5%, code quality up 3.4%, productivity up 2.1%, job satisfaction up 2.2% (2024 DORA). Different deltas, different directions. Individual measures up. System measures down. DORA reported a gap. The internet reported a verdict.

The qualifier most coverage dropped is the entire interesting question. DORA didn’t say AI is bad. DORA said the individual moved one direction, the system moved another, and offered a hypothesis (they called it the vacuum hypothesis) for the gap. Drop the qualifier and the data becomes a slogan. Keep it and the data points somewhere more useful: at the surrounding system that absorbed the saved time.

The hub piece in this series gave the headline on a decade of DORA findings. This article gives the receipts on three years of the AI ones. The thesis, stated plainly: AI is a force multiplier. Force is indifferent to the result. The tool is the force; the surrounding system is the direction. That framing isn’t ideological. It’s what 2024’s paradox and 2025’s reversal both pointed at, and it’s the strongest reading the data supports.


The 2024 paradox, in DORA’s own words

The 2024 sample was around 3,000 respondents across 104 countries (the tenth annual DORA report), built on Bayesian SEM, multiple targeted models, and a qualitative interview supplement. Observational, self-reported, well-instrumented. Not telemetry. Survey research, taken seriously.

AI adoption that year was already universal: 81% of organizations had made AI a higher priority; 75.9% of practitioners relied on AI in daily work, with writing code (74.9%) and summarizing content (71.2%) the top use cases. The numbers per 25% increase in AI adoption, restated in full because most coverage didn’t:

OutcomeEstimated change
Documentation quality+7.5%
Job satisfaction+2.2%
Flow+2.6%
Productivity+2.1%
Code quality+3.4%
Delivery throughput−1.5%
Delivery stability−7.2%
Valuable work−2.6%

Two factors pulling in opposite directions. The 2024 report itself made the split explicit by separating delivery into throughput and stability as distinct latent factors and adding rework rate as a fifth metric. The split mattered. It’s why the report could say “throughput down, stability down” as separate findings rather than collapsing them into “performance down.”

The hypothesis DORA offered for the gap is worth the full quote. “AI expedites valuable, high-priority work. But instead of producing sustained free time, a vacuum is created that gets filled by less-valued, non-promotable work. Net effect on delivery metrics is negative.” The framing is mechanical, not moral. Time saved → time refilled with lower-value work → no delivery gain.

For a reader who’s spent time with one-piece flow, the translation is automatic. A team that saves an hour and immediately fills it with another in-flight item has not gained an hour of throughput. It has gained an hour of queue depth. The individual feels faster because they finished one thing faster. The system delivers slower because the freed time is now in someone else’s queue. Little’s Law doesn’t care how the work entered the system. It cares how much is in it.

The same 2024 report carried a second, easily-missed finding. The high-performance cluster shrank from 31% to 22%; the low cluster grew from 17% to 25%; medium performers showed lower change failure rates than high performers, the first time since 2016. The industry got measurably worse at delivery in 2024, and the report linked AI adoption to the slip without claiming AI caused it. That is the right epistemic register for observational data, and exactly the qualifier most coverage dropped.

If 2024 had been the last word, the doom takes would have been right. 2024 wasn’t the last word.


The 2025 reversal

The 2025 report doubled down on the AI question. 4,867 respondents across 100+ countries (the largest sample in the program’s history), plus 100+ hours of qualitative data and 78 in-depth interviews informing the new DORA AI Capabilities Model. The report also shifted its language from “effects” to “comparisons” to better reflect the observational nature of the work (2025 DORA).

Three things flipped between 2024 and 2025:

Outcome20242025
Software delivery throughputNegativePositive
Valuable workNegativePositive
Product performanceNeutralPositive
Software delivery instabilityNegativeStill negative
BurnoutNo relationshipNo relationship
FrictionNo relationshipNo relationship

Three reversed. Three didn’t. The picture is not “AI was bad and now it’s good.” The picture is more interesting than that.

What changed in the population, more than anything else, was experience. AI adoption rose from 75.9% to 90% in twelve months. Median user experience reached 16 months. Median daily AI interaction reached two hours, around 25% of an eight-hour day. The respondents in 2025 weren’t just using AI. They had a year of practice and were integrating it.

DORA’s explanation for what reversed and what didn’t is precise. “These three outcomes (instability, burnout, and friction) are properties of the sociotechnical system, not just the individual keyboard. They reside beyond the individual’s purview.” On instability specifically: “AI accelerates code generation faster than delivery pipelines, testing, and governance structures can handle. Without foundational delivery capabilities (CI, loosely-coupled architecture, fast feedback loops), AI makes instability worse.” The individual adapted. The system didn’t.

Gene Kim’s foreword frames it the same way: “Teams with great engineering practices (fast feedback, loosely-coupled architecture, culture of learning) get outstanding AI benefits. Teams without them will have a very bad time.” That’s the cleanest one-line summary of the 2024-to-2025 arc. Teams with the surrounding system in place adapted; teams without it accelerated their existing dysfunction.

Three years of data, two reversals, one common thread. AI’s effect tracked the surrounding system, every time.


Force is indifferent to the result

The 2025 report’s central thesis sentence is the strongest single claim DORA has made about AI to date. “AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.”

Amplifier is close. Force multiplier is sharper, and the difference matters. An amplifier is mostly a positive thing made larger. A force multiplier is morally indifferent. The force is whatever the team is already producing, and the multiplier is the AI. A team producing well-aimed work gets more well-aimed work, faster. A team producing the wrong thing gets more of the wrong thing, faster. The tool is unchanged in either case.

The tool is the force. The surrounding system (capabilities, culture, platform quality, user-centricity, batch size, version control discipline) is the direction. Same tool, opposite outcomes, observable across three years of data and roughly ten thousand respondents. The point is not that AI is good or bad. The point is that AI is not the variable. The system around it is.

Eirini Kalliamvakou (PhD, GitHub) contributed a chapter to the 2025 report titled “The AI Mirror,” and her statement of the thesis triangulates from a different author and a different program: “Deploying AI tools alone will not produce transformation. Without intentional changes to workflows, roles, governance, and cultural expectations, AI tools are likely to remain isolated boosts in an otherwise unchanged system: a missed opportunity.”

Honest causation note. This is observational. DORA cannot prove that adding the seven capabilities causes the positive AI outcome. They can show that high-capability teams plus AI yields positive outcomes, and absence of capabilities plus AI yields neutral or negative ones. The pattern holds across two years, two methodological iterations, and the qualitative corpus. It is the strongest reading the data supports. It is not a proof.

If the system is the direction, the question is which parts of the system. DORA’s 2025 answer is seven.


The seven capabilities

The DORA AI Capabilities Model emerged from 4,867 survey respondents and 78 in-depth interviews. Seven capabilities; each one moderates AI’s impact on at least one outcome. They are not aspirational. They are the variables the 2025 data identified as the difference between AI helping and AI not helping.

User-centric focus (the surprising one)

The 2023 baseline is striking on its own terms: user-centric teams showed +40% organizational performance versus non-user-centric teams (2023 DORA). The 2025 sharpener is sharper still: “In the absence of user-centric focus, AI adoption has a negative impact on team performance.” Not “fails to help.” Not stagnation. Degradation. AI accelerates the production of features; if the team’s existing direction is shipping features without a clear sense of what users need, AI accelerates that direction. This is the cleanest illustration of “force is indifferent to the result” anywhere in the dataset, and it earns its own beat below.

Quality internal platforms (the prerequisite)

The platform finding is the second-loudest: “An investment in AI without a corresponding investment in high-quality platforms is unlikely to yield significant returns at the organizational level.” Low-quality platforms reduce AI’s organizational benefit to negligible. The 2025 report measures platform quality across eleven dimensions, and the most strongly correlated with overall positive experience is the simplest one: does the platform tell users clearly what succeeded and what failed.

Cross-program triangulation matters here. DORA’s 2025 platform adoption number is 90%. Puppet’s 2024 report found most organizations had platform teams for at least three years (2024 Puppet). Platform engineering is no longer a frontier capability. The question is not “do you have a platform” but “is yours good.” For organizations whose answer to the second question is “no,” the AI investment is not going to pay back at the system level.

Working in small batches

Smaller PRs, fewer changes per release, shorter task cycles. The 2025 finding closes the vacuum hypothesis: smaller batches absorb the freed time into shipped work, not queue. Worth flagging: small batches slightly reduce the individual-effectiveness perception of AI, because perception is driven in part by generating large volumes of code quickly. The right move is to tolerate the perception cost. Product performance and friction reduction are the system-level wins. Anyone who’s tried to defend small batches against a change approval board knows the next argument by heart. That one is its own piece.

Strong version control practices

Frequent commits and proficiency with rollback. AI generates code at a higher rate than pre-AI workflows assumed; the safety net of version control matters more, not less. Frequent commits amplify AI’s benefit to individual effectiveness; rollback fluency amplifies AI’s benefit to team performance. The practitioner-level rules are in The Iron Laws of Agentic Coding.

Clear and communicated AI stance

The pull-quote on this one is unusual: “The capability measures clarity and awareness of the policy, not the content of the policy itself.” Clarity matters more than content. Without clarity, two failure modes emerge in parallel: developers act too conservatively (using AI less than they could) or too permissively (using AI in ways they shouldn’t). Either is expensive. The fix is two paragraphs, written down, every team member can recite. The policy doesn’t need to be elaborate. It needs to exist.

AI-accessible internal data and healthy data ecosystems

The last two capabilities are joined at the hip. AI tools connected to internal repos, documentation, decision records, and data sources amplify individual effectiveness and code quality. Generic models produce generic results. And the data ecosystem the AI references (quality, accessibility, unification) determines the quality of the output. Garbage in, accelerated garbage out.

Seven capabilities. None of them AI-specific. They were predicting delivery performance before AI existed. The thesis isn’t “these capabilities make AI work.” It’s “these capabilities make anything work, and AI exposes the gap fastest.”


The user-centricity finding, sharpened

The user-centricity warning earns its own beat because it is the single capability where the sign of the AI effect flips. The other six modulate the magnitude of a positive effect. User-centricity flips it. Add the capability, AI is positive. Subtract it, AI is negative. The tool is unchanged.

Worth restating the finding in its loudest form: in the absence of user-centric focus, AI adoption has a negative impact on team performance. The cleanest illustration of “force is indifferent to the result” in three years of DORA data on AI. Same tool, same week, same team, and the surrounding system determines whether the tool helps or harms.

The 2023 user-centricity construct measured three things specifically: understanding user needs, aligning team success with user value, and using user feedback to reprioritize. None of those are AI-specific. All three are upstream of the AI question. If the team has a clear feedback loop with users, AI accelerates features that map to that signal. If the team doesn’t, AI accelerates features that don’t.

The practical implication is uncomfortable. If a team doesn’t know what users want and AI is in the team’s hands, the team is producing the wrong thing faster than before. Velocity up; output worse. Every AI rollout strategy that treats user-centricity as a downstream concern (something to fix after the productivity gain materializes) has the order of operations exactly backwards.

If the user-centricity finding is the article’s loudest argument that AI is a force multiplier, the sociocognitive findings are the quietest, and the most counterintuitive.


The not-doom findings

The discourse around AI and software craft predicted six things would erode under AI assistance: meaning, pride, ownership, mental engagement, peer connection, and the skills that defined a developer’s identity. Reasonable hypotheses. Empirically testable. Tested.

Daniella Villalba (PhD) contributed a chapter to the 2025 report investigating exactly these six dimensions. Authentic pride: up in AI adopters, with the proposed mechanism being more time on valuable work. Meaning of work: unchanged. Need for cognition: unchanged. Existential connection: unchanged. Psychological ownership (“this is my code”): unchanged at 78% credible interval. Skill reprioritization: only prompt engineering rated as newly more important; everything else, including syntax memorization, unchanged.

The interpretation is that developers, in 2025, treated AI like a compiler: a tool that works on their behalf, not a co-author. A small eye-tracking sub-study from UC Berkeley contributed to the same report found students paid less than 1% visual attention to AI chat during interpretive tasks, versus around 19% during mechanical tasks. They ignored AI when deep understanding was needed. Suggestive, not definitive, but consistent with the survey data.

The honest qualifier matters. This is one survey, one year, self-reported, with a median user experience of 16 months. The data does not prove AI is harmless to meaning forever. It proves that, at the median, in 2025, the predicted collapse hadn’t shown up yet. The doom takes were wrong as of 2025. They are not wrong forever by definition. Hold both things at once.


The honest limits

Three things the article depends on, named directly.

The data is self-reported. No git logs, no telemetry, no objective deployment records, just people reporting their own deployment frequency, AI usage, and team practices, processed through Bayesian SEM and triangulated with qualitative interviews. The longer treatment of every epistemic concern with DORA’s research program is in DORA Metrics: What the Research Actually Says.

Three years of history on a technology that updates weekly. The AI in respondents’ hands in summer 2025 is not the AI on the market in spring 2026. The thesis survives this only because the seven capabilities are not AI-specific. They were predicting delivery performance before AI existed. The capabilities are durable even if the tools aren’t.

Observational, not experimental. DORA cannot prove that adding the capabilities causes the positive AI outcome. What the research can tell you: across roughly ten thousand respondents over three years, these seven capabilities correlate with positive AI outcomes, and their absence correlates with neutral or negative ones. What it cannot tell you: that adding the capabilities will make AI work for your team. The strongest observational read on a fast-moving question is not a guarantee. It is, however, far better than the alternatives currently on offer.


What this means in practice

If you’re an IC

Use AI to close items already open, not to open new ones. The vacuum hypothesis is the failure mode the IC can refuse on Monday. AI saves you an hour; close an in-flight item with that hour. Don’t pull a new ticket.

Smaller commits. AI assistance increased PR sizes substantially across 2024 and 2025, and larger changesets are inherently riskier. Resist. Frequent commits, fluent rollback. The practitioner-level constraints are in The Iron Laws of Agentic Coding.

Treat AI output as draft, not done. This is not an insult to the tool; it’s a recognition that code review is the bottleneck AI exposed. The Slop Code Taxonomy names what unreviewed AI output looks like once it’s in production.

If you’re a team lead

Instrument batch size before you instrument AI usage. Track PR size, lead time, deployment frequency. If you can see batches grow with AI adoption, you have the lever. AI usage as a metric is a proxy for the wrong thing.

Establish the AI policy. Two paragraphs, written down, every team member can recite. Clarity matters more than content. The 2025 finding is unambiguous on that.

Defend code review capacity. AI adoption increased code review time substantially across 2024 and 2025. Adding AI tools without adding review capacity creates the exact bottleneck the vacuum hypothesis describes.

If you’re a VP or director

Do not let an AI rollout be your platform-quality strategy. Low-quality platforms neutralize AI’s organizational benefits. Fix the platform first or fix it in parallel. Don’t expect AI to compensate.

Underwrite user-centricity explicitly. The single capability that determines whether AI helps or harms team performance. If teams don’t have a clear user-feedback loop, AI accelerates their existing direction, and without user focus, that direction is wrong.

Pair delivery metrics with value metrics. DORA’s 2022 finding remains in force: delivery alone doesn’t predict organizational success. AI exposes the gap fastest, because AI accelerates whatever the team is already producing, including features that no one wanted.

The next spoke in this series asks the question the small-batches capability raises every time it lands in a regulated organization: should I kill my change approval board? The data has an answer. It earns its own piece.


The thesis, restated

AI is a force multiplier. Force is indifferent to the result. The tool is the force; the surrounding system is the direction.

Three years, roughly ten thousand respondents, two methodological iterations, one consistent finding: AI’s effect tracks the surrounding system. Same tool, opposite outcomes, every time. A team with strong delivery practices, healthy culture, a quality platform, and a real user-feedback loop sees AI accelerate good outcomes. A team without those things sees AI accelerate bad ones.

The doom takes were wrong as of 2025. The hype takes were wrong from the start. Pride didn’t collapse, meaning didn’t erode, throughput recovered, and the seven capabilities the 2025 report named were predicting delivery performance before any of these tools existed. Hold the framing lightly: three years of history on a technology that updates weekly is what the research can offer. The tools will keep changing. The seven capabilities won’t.