The State of the State of DevOps
Between 2014 and 2025, two research programs surveyed over 45,000 software professionals across more than 100 countries. The DORA program (DevOps Research and Assessment, now at Google Cloud) published nine annual Accelerate State of DevOps reports. Puppet Labs - later Puppet by Perforce - published five companion State of DevOps reports covering overlapping but distinct ground. Together, they constitute the longest-running, largest-scale research effort in software delivery history.
I read all fourteen reports. This is what the accumulated evidence actually says - the findings that held, the ones that didn’t, and what it means if you’re building software today.
This is not a DORA tutorial or a metrics primer. If you want that, I wrote a research evaluation of the DORA program separately. This piece is the synthesis - the sum total of what a decade of data proved, where the research contradicted itself, and the uncomfortable conclusion the whole body of work keeps pointing toward.
What the Decade Proved
Six findings survived repeated testing across both research programs, multiple methodologies, and demographic shifts. They are as close to durable truths as survey research gets.
Speed and stability are not a tradeoff
This is the founding finding and still the most important. Before DORA, the industry assumption was that moving fast means breaking things. Pick one.
Every year from 2014 through 2025, both research programs found the opposite: high performers achieve speed and stability simultaneously. The gap between top and bottom performers widened over time - from 30x more frequent deployments (2014) to 182x (2024), with a 2,293x advantage in recovery time. In 2025, DORA replaced the traditional performance tiers with seven team archetypes. The two highest-performing clusters - representing 40% of respondents - achieved both high throughput and high stability with the lowest burnout (2025 DORA).
This is not a statistical artifact. It reflects a fundamentally different operating model: small batches, fast feedback, automated safety nets. Organizations that batch large and deploy infrequently are not being careful. They are accumulating risk.
Evidence strength: Replicated - confirmed across every dataset in both programs.
Culture predicts everything else
Westrum’s generative culture typology - high trust, information flow, shared risk, learning from failure - is the single most durable predictor in the entire corpus. It predicts software delivery performance, operational performance, security practice adoption (2022 DORA), burnout reduction, job satisfaction, and even which technical practices teams actually adopt (2023 DORA).
Both programs found this independently. Puppet’s reports traced it through team identity and interaction clarity (2021 Puppet). DORA tracked it through psychological safety, belonging, and inclusion. The specific frameworks shifted - Westrum in 2014-2018, Project Aristotle in 2019, back to Westrum plus belonging in 2021 onward - but the underlying finding never wavered: how people treat each other predicts how well their software works.
The 2022 report found that generative culture was 1.6x more predictive of security adoption than any tool or process. The 2023 report found it drove substantial improvements in every technical capability studied. Culture is not the soft stuff you do after the real engineering. It is the engineering, measured at the organizational level.
Evidence strength: Replicated - consistent across both programs for a decade.
Loosely coupled architecture is the highest-impact technical capability
Emerged as a factor in 2015, formalized as the #1 predictor of continuous delivery success in 2017 (2017 DORA), and confirmed every year since. In 2023, it was the single highest-impact technical capability across team performance, organizational performance, delivery, operations, burnout reduction, and job satisfaction (2023 DORA).
The mechanism is Reverse Conway’s Law: system architecture shapes organizational structure. Loosely coupled systems enable teams to design, deploy, test, and release independently - without fine-grained coordination, without waiting for other teams, without deploying outside business hours. Tightly coupled systems create the opposite: cross-team dependencies that make tasks 10-12x slower (2020 Puppet).
This finding is durable, practical, and underappreciated. Most architecture conversations focus on technology choices. The research says the highest-leverage architecture decision is whether teams can ship independently.
Evidence strength: Replicated - consistent finding since 2015 across both programs.
Change approval boards hurt both speed and stability
In 2014, the first DORA report found that external change approval boards reduce throughput with no measurable improvement to stability (2014 DORA). In 2019, the finding strengthened: CABs also increase failure rates, and teams with heavyweight change processes were 2.6x more likely to be low performers (2019 DORA). In 2020, Puppet quantified the inefficiency: organizations with high orthodox approval processes were 9x more likely to report high inefficiency (2020 Puppet).
Every dataset says the same thing. Peer review plus automated testing plus deployment automation outperforms committee-based approval in every measurable dimension. The common objection - “but regulatory compliance requires separation of duties” - was addressed directly in the 2020 Puppet report: automated deployment with peer review and audit trails satisfies SOX and SOC 2 requirements without external boards.
Evidence strength: Replicated - consistent across both programs, six years of data.
Documentation is a force multiplier, not overhead
This one surprised the researchers. Documentation quality was first measured in 2021 and immediately proved to be one of the strongest predictors in the model - correlating with 3.8x more integrated security, 3.5x more SRE adoption, and 2.5x better cloud leverage (2021 DORA).
By 2023, the amplification effect was quantified: high-quality documentation made trunk-based development 12.8x more impactful on organizational performance, continuous integration 2.4x, and continuous delivery 2.7x (2023 DORA). Teams with well-written documentation saw new hires perform at 130% the productivity of new hires on poorly-documented teams.
The nuance matters: documentation does not directly improve software delivery performance. It amplifies everything else. It is infrastructure for organizational learning.
Evidence strength: Promising - confirmed across three consecutive years (2021-2023) in DORA data; not independently replicated outside the program.
Platforms scale what individuals cannot
Puppet introduced internal platform teams as a primary scaling mechanism in 2020. By 2025, platform adoption reached 90% across both research programs (2025 DORA, 2024 Puppet). The finding is consistent: platforms treated as products - with roadmaps, user research, feedback loops, and dedicated product management - deliver meaningful gains in productivity, team performance, and organizational performance.
But the research is equally clear that platforms deployed as infrastructure mandates can harm delivery performance. The 2024 DORA report documented platform engineering’s J-curve: short-term throughput and stability losses during adoption, with payoff only for organizations that persist (2024 DORA). The 2025 report elevated platform quality to a prerequisite for organizational AI value - low-quality platforms neutralize AI’s benefits entirely.
Evidence strength: Promising - confirmed across both programs 2020-2025; the platform-as-product distinction is consistent but relies primarily on self-reported survey data.
What Changed Along the Way
The research program’s willingness to contradict its own findings is a feature, not a bug. Here are the meaningful reversals.
The metrics themselves kept evolving
The “four key metrics” - deployment frequency, lead time, change failure rate, and mean time to restore - were never as fixed as conference talks implied. In 2018, DORA added availability as a fifth metric. In 2021, availability was replaced by reliability (SLI/SLO/error budgets). In 2024, the four keys were split into two distinct factors: throughput and stability, with rework rate added. In 2025, DORA abandoned the Elite/High/Medium/Low performance tiers entirely in favor of seven team archetypes (2025 DORA).
Anyone treating the framework as settled is already behind. The researchers know this. The industry often doesn’t.
The Elite cluster appeared, vanished, and was abandoned
Elite performers were introduced in 2018 at 7% of respondents. They grew to 26% by 2021. In 2022, the cluster vanished entirely - the data couldn’t support it (2022 DORA). It returned in 2023 at 18%. By 2025, DORA replaced the tier system with seven archetypes, acknowledging that a single linear ranking couldn’t capture the diversity of how teams actually perform.
The taxonomy couldn’t hold because performance isn’t one-dimensional. A team can have high throughput and low stability, or high reliability and low delivery speed. The 2025 archetypes - from “Foundational Challenges” to “Harmonious High-Achievers” - are an honest admission that the industry is more complex than a four-tier leaderboard.
Trunk-based development isn’t universally positive
Positive in every dataset from 2014 through 2021. Then in 2022, for the first time in eight years, trunk-based development showed a negative impact on software delivery performance (2022 DORA). In 2023, it still showed burnout increases. The explanation: it requires discipline, experience, and supporting practices (documentation, fast CI, small batches) to work. Without those, it breaks.
This is a useful corrective to the “just do trunk-based development” advice that circulates in conference talks. The practice is sound. The preconditions matter.
Cloud without transformation is worse than no cloud
Early reports (2018-2019) found that meeting NIST’s five cloud characteristics correlated with elite performance. By 2023, the finding had sharpened: public cloud without infrastructure flexibility - lift-and-shift without operational model change - predicted decreased performance (2023 DORA). The 2024 report was blunt: partial cloud adoption is worse than staying on-premises (2024 DORA).
The variable was never “cloud.” It was flexibility - on-demand self-service, rapid elasticity, measured service. You can achieve that in a data center. You can fail to achieve it on AWS.
AI reversed its own findings within one year
In 2024, DORA found that a 25% increase in AI adoption correlated with 1.5% decreased throughput, 7.2% decreased stability, and 2.6% less time on valuable work - even as 75% of practitioners reported personal productivity gains. The “vacuum hypothesis” explained the paradox: AI expedites high-value work, but the freed time gets absorbed by lower-value tasks (2024 DORA).
One year later, throughput reversed to positive. Valuable work reversed to positive. Product performance shifted from neutral to positive. But delivery instability persisted (2025 DORA). The explanation: people adapted, but pipelines, testing processes, and governance structures haven’t. AI generates code faster than systems can absorb it. The individual adapted. The system hasn’t.
User-centricity went from unmeasured to the strongest predictor
Not in any report before 2023. Then it was instantly the most powerful driver of organizational performance - 40% higher than non-user-centric teams. By 2025, the finding had sharpened to a warning: AI adoption without user-centric focus has a negative impact on team performance (2025 DORA). Teams that ship fast without understanding what users need produce the worst outcomes in the dataset.
The fact that this wasn’t measured until year nine of a decade-long program is itself a finding worth naming.
What It Means If You’re Building Software Today
The accumulated evidence, compressed into practitioner decisions.
Track DORA metrics, but as diagnostics, not targets. The 2023 report issued an explicit Goodhart’s Law warning. Use the four keys at the application level to find bottlenecks. Never use them to compare teams or individuals. Never make them performance goals.
Adopt AI tools, but track instability as your canary. AI is an amplifier (2025 DORA) - it magnifies strong foundations and weak ones equally. Establish a clear, communicated AI policy. Connect tools to internal context. Work in small batches. If your instability metrics are rising, the system hasn’t caught up to the individual.
Build a platform team if you’re past ~50 engineers. Treat it as a product with real users. Assign product management. Expect the J-curve. Fund it explicitly - don’t pull engineers off it for urgent requests. 90% of organizations already have one; the question is whether yours is good.
Stop calling it “culture.” Name the specific blockers: unclear team responsibilities, risk aversion masquerading as governance, insufficient feedback loops, chaotic priorities. The 2024 report found that priority stability is the hidden variable - even strong leadership and good documentation can’t compensate for constantly shifting goals. Fix governance and roadmap processes before investing in culture programs.
Build security in from the start. 70% of platform teams now embed security from inception (2024 Puppet). Culture - not tooling - is the #1 predictor of security adoption. CI/CD is the foundational infrastructure for supply chain security scanning. Without it, consistent scanning is nearly impossible.
Kill the CAB. Replace it with peer review, automated testing, and deployment automation. Every dataset across both programs says this. If regulatory compliance is the objection, the 2020 Puppet report addressed it directly: automated deployment with peer review and audit trails satisfies separation-of-duties requirements.
Invest in documentation. Not as a compliance exercise. As organizational infrastructure. It amplifies every technical capability, accelerates onboarding, and - unlike most investments - compounds over time.
The Uncomfortable Conclusion
Here is the intellectual arc of the decade:
2014-2017: Can teams have speed and stability? Yes. And here are the practices that predict it. 2017-2020: Which capabilities predict performance? These twenty-four. Here’s how they interact. 2020-2022: But does context matter? Yes - delivery without reliability produces no organizational benefit. Security requires culture. SRE has a J-curve. 2022-2023: What else are we missing? User-centricity. Documentation. The fact that underrepresented team members experience these findings differently. 2024-2025: Does AI change the model? It amplifies it. Everything that was conditionally true is now more conditional.
Each era discovered that the previous era’s answers were conditional. The practices work - but only in context. The metrics are useful - but only as diagnostics. The capabilities predict performance - but only when combined with the right culture, the right architecture, the right leadership, and the right focus on users. Every “best practice” has a J-curve, a precondition, or an equity dimension that the conference talk didn’t mention.
The research keeps circling the same finding it can’t quite formalize: the organizations that win are the ones that adapt fastest. Not the ones that adopt the most practices, or hit the best metrics, or buy the right tools. The ones that detect problems early, respond without bureaucratic friction, learn from failure without blame, and continuously adjust course.
This is not a capability you can measure on a Likert scale. It is human judgment, emotional intelligence, improvement mindset, and the willingness to stay uncomfortable permanently - what James Carse would call playing the infinite game. The research can observe its effects. It can measure its correlates (generative culture, user-centricity, loosely coupled architecture). But it cannot capture the thing itself, because the thing itself is the capacity to respond to situations that haven’t happened yet.
The tension in every organization is between those who want systems of control - repeatable, measurable, auditable processes that provide the illusion of predictability - and those who want quality and velocity, which require the autonomy to make judgment calls, take calibrated risks, and change direction when the situation demands it. A decade of data consistently shows that the second group outperforms the first. But the second group’s advantage is precisely the thing that resists systematization.
Use the research. It is the best evidence we have on what makes software teams perform. Track the metrics. Build the platforms. Invest in culture, documentation, and architecture. But hold it all lightly. The most important finding from ten years of data is the one that can’t be reduced to a dashboard: the organizations that thrive are the ones that never stop adapting, and adaptation is a human skill that no framework can replace.
The Reports
All fourteen reports that informed this synthesis, linked to full summaries:
| Year | Report | Primary Contribution |
|---|---|---|
| 2014 | Puppet/DORA State of DevOps | First scientific link between IT performance and business outcomes |
| 2015 | Puppet/DORA State of DevOps | Stability gains without throughput loss; lean management; burnout |
| 2016 | Puppet/DORA State of DevOps | ROI formulas; lean product management; eNPS |
| 2017 | Puppet/DORA State of DevOps | Transformational leadership; loosely coupled architecture as #1 predictor |
| 2018 | DORA Accelerate State of DevOps | Elite performers emerge; availability as 5th metric; J-curve introduced |
| 2019 | DORA Accelerate State of DevOps | Productivity construct; CABs hurt stability too; psychological safety |
| 2020 | Puppet State of DevOps: Platform Edition | Platform teams as scaling mechanism; change management taxonomy |
| 2021 | DORA Accelerate State of DevOps | Reliability replaces availability; SRE; documentation as capability |
| 2021 | Puppet State of DevOps | Team Topologies; the persistent 79% mid-tier stagnation |
| 2022 | DORA Accelerate State of DevOps | Elite cluster vanishes; conditionality; supply chain security |
| 2023 | DORA Accelerate State of DevOps | User-centricity; Bayesian methods; documentation amplifiers; equity |
| 2024 | DORA Accelerate State of DevOps | AI paradox; vacuum hypothesis; platform J-curve; priority stability |
| 2024 | Puppet Platform Engineering | Platform maturity; security as foundational; the “builder” developer |
| 2025 | DORA AI-Assisted Development | AI as amplifier; seven capabilities model; seven team archetypes; VSM |
This is the hub article in a series. Deep dives on specific findings - change management, AI adoption, platform engineering, culture, and documentation - are forthcoming as individual pieces.