Theory of Constraints for Engineering Teams


QA was the visible bottleneck. Code review piled up upstream. Staging waited downstream. Every standup featured “blocked on QA” as a chorus, six engineers and two QA, and the math wasn’t working. The intuitive move was approved-in-principle: hire two more QA engineers. The pull was strong.

The actual move was different. Fan-out analysis of what QA was doing on every feature, sorting each manual check by what it was checking and which had been the same check last sprint. Automated test specification work to convert manual checks into PR-runnable tests, written by engineers from QA’s spec, run on every commit. A hot-seat rotation for QA-lead duties (two-week shifts pulling an IC into the role), distributing the bandwidth across the team rather than concentrating it on two people. Eight weeks in, QA was no longer the visible bottleneck. The constraint had moved, to integration testing in staging, where coupling to a shared environment with two other teams was the new wall.

Hire didn’t happen.

Most engineering teams do not have a productivity problem. They have a constraint they have not named, and an improvement budget pointed everywhere except it. That is what the manufacturing novel everyone cites and almost nobody reads correctly is about.

This article closes Four on the Floor, the four-article quartet inside the From the Floor lean-applied-to-software thread. One-Piece Flow argued the unit of flow is a story, not a commit. The Andon Cord argued the cord is the cheap part; the culture is load-bearing. WIP Limits Are Not Suggestions argued the limit is a wall or it is nothing. Theory of Constraints is the fourth beat, the diagnostic that names what the previous three are surfacing, signaling, and routing around.


The 1984 manufacturing novel

Eli Goldratt’s The Goal (1984) is a business novel about a fictional plant manager named Alex Rogo who saves his factory by finding the bottleneck and refusing to optimize anything else until it moves. The book has sold over seven million copies and became required reading in industrial engineering. It almost never gets read by software people, and when it does, it gets translated badly.

The novel’s clearest single illustration is the heat-treat oven. Alex’s plant has two bottlenecks; the oven is one. A foreman named Mike Haley figures out how to operate it differently: re-order how batches travel through, fill the oven completely on every run, outsource overage. The oven moves from constraint to non-constraint. The point of the scene, which the book has spent two hundred pages setting up, is that every shop-floor improvement Alex’s plant had been making for months had been making things worse, because the improvements were piling work in front of the heat-treat oven faster than it could clear. Translation: the heat-treat oven is your code review queue, your QA window, your staging environment, your security review. The shop-floor improvements are everything else you’ve been doing.

The bad translation goes “every system has a bottleneck, find it and fix it,” which is technically true and operationally useless. It misses the actual mechanism: that a non-bottleneck improved is wasted work, that utilization at the constraint is the only utilization that matters, and that every improvement made anywhere except the constraint pushes more work into a queue the constraint cannot serve. The actual mechanism is five steps. The story at the top walked all five. Now name them.


The five focusing steps, walked

Goldratt’s Five Focusing Steps, in the canonical formulation:

  1. IDENTIFY the system’s constraint(s).
  2. Decide how to EXPLOIT the system’s constraint(s).
  3. SUBORDINATE everything else to the above decision.
  4. ELEVATE the system’s constraint(s).
  5. If a constraint has been broken, go back to step 1, but do not allow inertia to cause a system’s constraint.

The five steps are not a checklist. Each one has a real failure mode. Walk them through the QA story.

Identify

The constraint is where work piles up, not where work is loud. Loud is where engineers complain. Piled-up is where the column on the board ages. In the QA story, “blocked on QA” was loud. The data was on the board: the ready for QA column had three weeks of stories aging in it, and the in development column was empty.

Failure mode: identifying the loudest team as the constraint, instead of the column with the most cards. The loudest team usually isn’t the constraint; they’re the team noisily aware that they’re being constrained by something else. The actual constraint is often quiet, because it’s keeping its head down trying to clear the queue.

Exploit

Get every drop out of the constraint before adding capacity to it. In the QA story, this is the fan-out analysis. What was QA actually doing? Some of it was things engineers should have been verifying on their PRs and weren’t, because they assumed QA would catch it. Some of it was repeated checks of behaviors that hadn’t changed in months. Some of it was load-bearing manual exploration that needed a human in the loop. Once the work was categorized, two-thirds of it didn’t need to be done by QA; it needed to be done in tests.

The next move was the automated test specification work: engineers writing tests from QA’s spec, running on every PR. The QA team kept the work that benefited from a human; the rest got pushed upstream where it could fail fast and cheap.

Failure mode: skipping straight to “elevate” because exploitation is unglamorous and slow. Hiring is approvable in one meeting. Fan-out analysis takes a sprint. Most teams cannot make themselves do steps 2 and 3 first because the alternative is easier to authorize.

Subordinate

This is the longest step, because it is the one that breaks engineering culture’s intuitions and the one most teams botch.

Subordinate means everything else aligns to the constraint, including, explicitly, slowing down non-constraints on purpose. In the QA story, this is the hot-seat rotation. Engineers were pulled out of their feature work for two-week shifts to absorb QA-lead duties. The team’s velocity on new features dropped during the rotation. Local utilization on individual ICs dropped. That was the point. The team was deliberately leaving capacity on the table so the constraint could clear.

Goldratt’s hammer is on page 211 of The Goal: “Activating a non-bottleneck to its maximum is an act of maximum stupidity.” The activate/utilize distinction holds it up. Utilizing a resource means using it in a way that moves the system toward the goal. Activating a resource means running it because it can run. The “we don’t want anyone idle” objection confuses these; they are different variables, and optimizing the wrong one breaks the system.

The math underneath is Reinertsen’s Principle Q3, the Principle of Queueing Capacity Utilization: capacity utilization increases queue size exponentially, not linearly. Below ~85% utilization, queues are manageable. Above 85%, queue length explodes. Approaching 100%, cycle time approaches infinity. The reason “we don’t want anyone idle” feels responsible is that it sounds like good management. The reason it isn’t is the math.

The 2025 evidence that drives this home: DORA’s most recent report, on AI-assisted software development, found that adding AI-generated code without elevating review capacity makes things worse. Pull-request size went up 154%. Code review time went up 91%. Bug rate went up 9%. RedMonk’s one-line explanation is the cleanest summary anyone has written: “code generation isn’t the bottleneck.” AI is a free engineer added to the input side of an unchanged system. The system queued the work in front of the actual constraint, and the constraint got worse.

Failure mode: refusing to slow down. The team that says “we can’t subordinate, the business needs the features” is the team that ships features into a queue six weeks deep. Subordination is the most efficient possible response when capacity is the issue. It feels like the least efficient response when measured by local utilization. Most teams need management air cover to do step 3 honestly, and most management does not provide it.

Elevate

Add capacity. Now and only now.

In the QA story, hiring would have been the first move. By the time the team got to step 4, it wasn’t necessary; the constraint had moved. This is what makes step 4 dangerous: most teams do it first, and most of the time it’s the wrong move. When a team does steps 1–3 properly, step 4 frequently turns out to be unnecessary, because the constraint either moves or shrinks under exploitation.

Failure mode: doing it first.

Repeat

Goldratt’s actual phrasing matters, and most translations drop it. The fifth step is not “repeat.” It is “if a constraint has been broken, go back to step 1, but do not allow inertia to cause a system’s constraint.” The anti-inertia clause is load-bearing. The constraint moves. The framework doesn’t have an end state.

The QA story’s coda is the structural payoff: the team didn’t celebrate clearing QA. They started over at step 1 with the new constraint: integration testing in staging, where coupling to a shared environment with two other teams was the new wall. Step 5 is what kept them honest. Most teams treat it as a footnote. That is exactly when inertia causes the next constraint.


Where engineering constraints actually live

The constraint is rarely “we don’t have enough engineers.” More often it’s downstream of writing code. Three patterns the team should know how to recognize.

Code review. DORA’s 2025 report found that adding AI to code generation, without changing surrounding workflow, increased pull-request size 154% and code review time 91%. The report’s framing uses explicit constraint language: “When developers use AI tools and write code faster, the code still needs to go through testing and review queues, followed by integration and deployment processes. The overall pace of delivery is unlikely to change significantly unless the surrounding workflows are updated.” How to find it: the in review column ages disproportionately. PR size grows over time without anyone deciding to grow it. Engineers ask in standup whether anyone has bandwidth to review.

Staging and shared environments. From 2017 onward, every DORA report has found loosely coupled architecture to be the highest-impact technical capability for continuous delivery, bigger than test automation, bigger than deployment automation. Teams whose architectures couple them to other teams’ deployment cadences cannot move faster than the slowest team they’re coupled to. The 2023 report found loosely coupled architecture had a substantial positive effect on every dimension measured, the only capability with that profile. How to find it: deploys are synchronized across teams because they share staging. “Reserve staging” is a calendar event. Multiple teams’ release schedules collapse to the slowest one.

Release coordination and Change Approval Boards. DORA 2019, verbatim: CABs “provide no evidence of reducing change failure rates; they slow delivery and force larger batches, which increases failure risk.” Heavyweight change processes correlated with 2.6x more likelihood of being a low performer. How to find it: deployments cluster around scheduled change windows. The CAB meeting is the de facto pull signal for what ships. Teams pre-batch work to fit the window.

The hiring trap is step 4 done out of order. “We need more engineers” is the wrong answer when the constraint is downstream of writing code. More engineers means more PRs in review, longer QA queues, longer staging contention. Hire comes after steps 1–3 are exhausted. The reason teams hire first isn’t analytical. It’s that hiring is easy to authorize and easy to point to, and steps 1–3 are neither.


Drum-buffer-rope, software-translated

When the team can name the constraint, the next question is whether the rest of the system can route around it. Goldratt called the mechanism drum-buffer-rope. The constraint sets the cadence (drum). A small buffer protects it from upstream variability so it never starves (buffer). Upstream is throttled to match the constraint’s rate (rope).

The translation. Software’s slowest gate sets the team’s effective deploy cadence. A small ready-for-review or ready-for-QA queue feeds the gate without overflowing it. PR creation throttles to that rate; when the buffer is full, you don’t open more PRs, you go help clear it. Every team that gets this right ships smaller batches by default, because the rope makes large batches mechanically impossible.

The history-of-ideas footnote that earns one paragraph: software’s most popular flow framework (kanban, in David Anderson’s 2010 book) started life in 2004 as a Drum-Buffer-Rope deployment at Microsoft. Anderson and Dragos Dumitriu took a struggling IT team in Microsoft’s XIT business unit from worst to best in nine months. Anderson later said both DBR and what he ultimately called “kanban” “would have brought identical results.” He preferred “kanban” because the term was more accessible. Every reader who has used a kanban board has been operating a piece of Goldratt’s machinery without anyone calling it that.


The pop-business baggage

ToC has acquired a thick layer of consulting-brand baggage since The Goal, and ducking it would soften the article. So name it.

Goldratt’s later books (Critical Chain (1997), It’s Not Luck (1994), Necessary But Not Sufficient (2000), Isn’t It Obvious? (2009)) extended the framework into project management, distribution, marketing, and life advice with diminishing rigor and sharper consulting wrap. Critical Chain Project Management generated its own consulting market. Goldratt founded the Goldratt Group, TOC-ICO emerged as the certification body, the thinking-process tools (Current Reality Tree, Future Reality Tree, Evaporating Cloud) drifted toward a guided-Socratic-method consulting product. None of that is a property of the 1984 mechanism. All of it is a fact about the marketplace that grew up around it.

The substantive academic critiques, with attribution: D. Trietsch (Auckland) argues that drum-buffer-rope is inferior to competing scheduling methodologies. Linhares (Getulio Vargas Foundation) has shown that the TOC approach to optimal product mix is unlikely to yield optimum results, as it would imply that P=NP. Nave (2002) argues that TOC fails to account for employees and fails to address unsuccessful policies as constraints. Gupta and Snyder (2009) argue that TOC has not demonstrated effectiveness in academic literature. Duncan and Noreen-Smith-Mackey separately note that the framework’s debts to Forrester’s systems dynamics, statistical process control, PERT/CPM, and management-accounting literature predating 1984 are under-acknowledged.

The distinction: the 1984 mechanism (five focusing steps, drum-buffer-rope, the activate/utilize distinction, the throughput-accounting argument that traditional cost accounting incentivizes inventory accumulation) is operationally robust and maps cleanly onto software because the underlying queueing-theory math is the same in both domains. The wrapper is a fact about the marketplace, not a property of the theory. The Lean Enterprise Institute’s framing on the framework war is the cleanest defuser anyone has written: “TOC methods fit nicely into the lean thinking five-step change framework, between steps two and three.” Same goals, different vocabularies. The reader’s job is not to buy the consulting. The mechanism survives the wrapper.


The quartet synthesis

Four mechanisms. Four specific roles.

One-piece flow is the unit of flow. A story, not a commit. Move one well-understood unit of value through the entire delivery pipeline before pulling the next one. This is the what: what is moving through the system.

The Andon Cord is the defect signal. Stop the line on a single anomaly before it compounds with three others into a release nobody can deliver. This is the quality control at the constraint: how the system detects problems where they hurt most.

The WIP limit is the constraint surfacer. The limit is a wall or it is nothing. This is the forcing function: the discipline that makes the constraint visible by refusing to let the team paper over it.

Theory of Constraints is the diagnostic. Identify, exploit, subordinate, elevate, repeat. This is the what to do once the WIP limit has surfaced the constraint.

WIP limits surface the constraint. The Andon Cord signals defects at it. One-piece flow moves work through it. ToC names it and tells you what to do. The four are not interchangeable. They are complementary. A team that has installed any one without the others has installed a partial system that mostly does not work.

The most common configuration in software is all four installed as decoration: a WIP limit nobody enforces, a #andon Slack channel nobody posts in, a “story” that’s actually a task batch, and a ToC reference that’s actually “find the bottleneck and add headcount.” That is lean theater. The team has the rituals. The team does not have the system. The math doesn’t work. The data doesn’t move. The constraint doesn’t get named. The improvement budget continues to be spent everywhere except the constraint.

Four on the Floor is a callback to the four-on-the-floor drum pattern: four steady beats holding the rhythm together. Drop one beat and the rhythm falls apart. The four mechanisms are the four beats. ToC is the fourth, and this article closes the quartet because once you can name the constraint, the other three are the operational answers.


Where ToC breaks

Three honest limits, because softening the rest of the article with caveats would be its own failure mode.

Knowledge work isn’t a uniform repeatable manufacturing step. Goldratt’s plant had identical car doors. Software has stories that are different every time. Donald Reinertsen wrote Principles of Product Development Flow in 2009 specifically because Goldratt’s mechanism needed extension to handle variability. The constraint moves more, the work is harder to size, and “exploit the constraint” is harder when the work itself reshapes the constraint. The math holds. The application is harder.

“Our constraint is the market, the customer, leadership.” Sometimes true. Then ToC says the same thing: exploit and subordinate to that constraint. The framework doesn’t fail when the constraint is outside engineering. It just produces uncomfortable answers, which are usually the correct ones.

Subordination cuts against engineering culture. The math is settled, the practice is contested. ToC works in environments where leadership can defend the team from its own metrics. It struggles where leadership cannot, because step 3 looks like underperformance against the local-utilization measures management itself is graded on.


Monday-morning actions

Concrete, by role.

If you’re an IC. Stop solving the loudest problem. Solve the queue with the most cards. The loudest problem is what your team complains about; the actual constraint is the column with three weeks of stories aging in it. They are usually different. Spend ninety minutes this week looking at the board, not at standup chatter, and name where the cards are piling up. That is your constraint. Everything else is decoration.

If you’re a team lead or agile coach. Pick one constraint. Spend a sprint on steps 1–3 only. Do not elevate. No hiring requests. No “we just need one more reviewer.” No reaching for budget. Force yourself, and the team, to do exploit and subordinate first, even when they feel slow. The pain of step 3 (leaving non-constraint capacity on the table on purpose) is the data. If, after a sprint of honest steps 1–3, the constraint has not moved or shrunk, then discuss elevation.

If you’re a VP or director. Stop approving headcount until the team can name the constraint and show what they did at steps 1–3. This is the same operational message as the WIP-limits Monday action (“stop measuring ‘in-flight initiatives’ as a count”) and the Andon Monday action (“underwrite the cord, out loud”). Leadership’s job is to make the system possible, not to outpace it. The team that hires before doing fan-out analysis is the team that, six months from now, will have a longer queue and a bigger payroll.


The honest summary

A plant manager finds his bottleneck and refuses to optimize anything else until it moves. A manufacturing novel sells seven million copies. A foreman at Microsoft applies the same mechanism to a struggling IT team in 2004 and ships a kanban book six years later. A team of six and two QA, twenty years after that, runs fan-out analysis instead of opening a job req.

WIP limits surface the constraint. The Andon Cord signals defects at it. One-piece flow moves work through it. Theory of Constraints names it and tells you what to do. Four steady beats. Drop one and the rhythm falls apart. Most teams have all four installed as decoration: a WIP number nobody enforces, an Andon channel nobody posts in, a story that’s actually a task batch, and a ToC reference that’s actually “find the bottleneck and add headcount.” That is lean theater. The mechanism is older than the rituals, and the rituals don’t work without the mechanism.

The constraint you cannot name is the constraint you cannot move. The improvement budget you spend everywhere except the constraint is the budget you spend on nothing at all. Goldratt was writing about a steel plant in 1984. He was also writing about your team.

Four on the Floor ends here.


Sources

For the prior three articles in this quartet: One-Piece Flow in Software Delivery, The Andon Cord in Software Teams, and WIP Limits Are Not Suggestions.