WIP Limits Are Not Suggestions
Six engineers, eighteen things in flight, a board with no number above any column (or its functional equivalent, a chat channel of “hey, can you also look at…”) that nobody calls a kanban board because nobody calls it anything. Standup updates all start with “still working on” and no card has moved in nine days. Somewhere in week three, what used to be a quick context switch costs an hour, then another, then a sprint. Six weeks later, cycle time has doubled and nobody can explain why.
That team was every team without an enforced limit, including yours. Most teams do have a number above the column. The number is decoration. A WIP limit you can step over is not a WIP limit; it is a suggestion. And suggestions don’t change behavior, don’t expose constraints, don’t shorten cycle time, and don’t surface the bottlenecks they were designed to surface.
The thesis of this piece is one line. The limit is a wall or it is nothing. The math doesn’t work otherwise.
This is the third article in Four on the Floor, the lean-applied-to-software quartet inside the From the Floor thread. One Piece Flow argued the unit of flow is a well-sized story, not a commit. The Andon Cord argued the cord is the cheap part; the culture around it is the load-bearing element. The same shape applies here. The number above the column is the cheap part. The discipline to stop pulling new work when the column is full is the load-bearing element, and most teams skip it.
Before we get into why teams step over the limit, the math has to be on the table. Not as background. As the floor.
Little’s Law, restated for skeptics
In May 1961, MIT professor John D.C. Little published a four-page paper in Operations Research that has since become the most cited result in queueing theory. It said:
L = λW
In words: for any queueing system observed long enough, the mean number of items in the system equals the mean arrival rate times the mean time each item spends in the system. The practitioner restatement, the one that lives on kanban boards and in DORA reports, is:
Cycle Time = WIP / Throughput
WIP is stories started but not finished. Throughput is stories finished per unit time. Cycle time is the elapsed time from start to finish per story. That is the entire vocabulary.
Walk it. A team of six. Throughput of three stories per week. WIP of fifteen. Cycle time is fifteen divided by three: five weeks. Halve the WIP to seven and a half, hold throughput constant, and cycle time halves to two and a half weeks. This is not aspirational. It is a theorem. The math doesn’t ask permission.
The honest practitioner objection is the next one: software is not a manufacturing line, the arrivals are not stationary, the work items are not uniform, the assumptions don’t hold. Little himself addressed this directly. In 2011, on the 50th anniversary of his original proof, he weakened the assumptions. Stationarity is not required (“LL.1 holds under nonstationary conditions,” p. 544). Queue discipline is irrelevant. For a system observed over any finite interval, “L = λW holds” exactly (p. 546). Little explicitly applied the law to emergency-room queues, supermarket lines, server requests, and financial portfolios. Software stories are not a harder case than emergency rooms. There is no honest way to argue the math doesn’t apply.
Daniel Vacanti, who managed the world’s first commercial Kanban project in 2007, gives the precise version of the assumptions Little’s Law actually requires. In Actionable Agile Metrics for Predictability, he reduces them to five: items entering must eventually exit, arrival rate must roughly match departure rate, work item age must be bounded, WIP must stay roughly steady, and units must be consistent. These are weaker than most practitioners assume, and the first one will matter when we get to “we’re blocked, so we can’t count it.”
Now the empirical hook the math earns. The Tasktop / Planview Project to Product program (Mik Kersten’s data set across 3,600+ teams and 38,000+ engineers) finds that most software value streams operate at 15% to 40% flow efficiency. Translate the number. Sixty to eighty-five percent of the time a story is “in progress,” nobody is touching it. Most readers reflexively assume “in progress” means “being worked on.” The data says otherwise. The column is full of stalled work.
Which is the section’s whole point. If the math is settled (and it is), then “WIP” is not a soft variable you tune for vibes. It is the operational handle on cycle time. As Vacanti puts it, in the line that earns its place on a poster: if you have a problem with predictability, you have a problem with WIP. That is the same claim as the section’s title, in fewer syllables. The limit is a wall or it is nothing. The math doesn’t work otherwise.
If the math is settled, the question is why teams step over the limit anyway. There are five common answers. Each one is a different way of refusing to look at the constraint the limit exists to expose.
Five reasons teams step over the limit
”We’re blocked, so we can’t count it”
A team marks a card “blocked” and visually dims it on the board. The unspoken move: it no longer counts toward the column’s WIP. It is parked. The column is “really” two cards under the limit, see, plenty of room.
Vacanti’s first Little’s Law constraint: items entering must eventually exit. A blocked card has entered the system. The math does not care that you have given it a different color. The cost of the card sitting there (context loaded into someone’s head, dependencies it is creating, partial state in the codebase or the staging environment) is still being paid. As the ProKanban article puts it, “WIP is everything started but not finished.” Hidden WIP doesn’t go away when you stop counting it. It just stops being a planning input. The cost surfaces later, in cycle time.
“Blocked” is a state, not an exemption.
”Just one more and we’ll catch up”
The team is one card over the limit. The cards in flight are “almost done.” Pulling one more feels like throughput. It isn’t. It is the planning fallacy in a kanban skin.
Kahneman and Tversky named it in 1979; Buehler, Griffin, and Ross replicated and extended it in 1994. People systematically underestimate task completion times even on tasks they have done many times before. The mechanism is the inside view: you focus on the specifics of this task and ignore the base rate of how your team performs. Your team’s data is the outside view. The data is older, more bored, and more accurate than your sense that this card is almost finished.
If the column is full, “just one more” is not throughput. It is a bet against the data your own team has already collected.
”The PM (or customer) is pushing”
Stakeholder anxiety arrives expressed as more requests. The team’s response is to take the work in to be polite. The board now has another card on it that nobody is working on. Anxiety did not decrease. It just changed coats.
The pull is the problem, not the symptom. Stakeholders push because they don’t trust the pipeline; they have learned, often correctly, that work disappearing into the team is a signal of nothing in particular about when it will reappear. Smaller batches build trust because deliveries get more predictable. The 2021 Puppet State of DevOps report named the inversion directly: organizations claiming to discourage risk practice infrequent, large-batch deployments, which are demonstrably riskier than frequent, small changes. The perceived safety of caution is the actual source of risk.
David Anderson’s framing in Kanban (2010) is the technical version of the same point: the WIP limit is the mechanism that converts a push system into a pull system. Removing the limit doesn’t keep stakeholders happy. It guarantees they will be unhappy six weeks later, on schedule.
”We don’t want anyone idle”
This is the central failure mode, and it deserves the longest treatment of the five. The objection sounds reasonable. Engineers are expensive. Nobody should be sitting around. Therefore: pull more work to keep people busy. This is the local utilization trap. It is the same trap One Piece Flow named, just at a different level of the system.
The math is unkind. Donald Reinertsen’s Principles of Product Development Flow (2009) states this as Principle Q3, the Principle of Queueing Capacity Utilization: capacity utilization increases queue size exponentially, not linearly. The underlying proof is Kingman’s formula, published the same year as Little’s Law: cycle time equals a variability factor times a utilization factor times a mean service time, where the utilization factor goes to infinity as utilization approaches one. Below about 85% utilization, queues are manageable. Above 85%, queue length explodes. Approaching 100%, cycle time approaches infinity. Troy Magennis has a clean Observable visualization of the curve in software-team terms; if you want to see what the math actually looks like before the next argument about idle engineers, that is the link.
Eliyahu Goldratt put the same idea in different language in The Goal (1984), p. 211: “Activating a non-bottleneck to its maximum is an act of maximum stupidity.” Goldratt’s distinction: activating a resource means running it; utilizing a resource means using it in a way that moves the system toward the goal. They are not the same thing. Pulling more cards to keep an engineer typing is activation. It is not utilization. It is the confusion of the two that makes “we don’t want anyone idle” feel like the responsible objection.
Tasktop’s flow efficiency comes back here. If 60% to 85% of “in progress” time is wait time across the industry’s data, the queue already exists. Most of the column is stalled work. Adding more cards does not increase throughput. It increases queue. It increases the number of half-built things stalled in front of the actual constraint, while the engineers you wanted busy are now context-switching between three of them.
So restate the central claim. The limit is a wall or it is nothing. The math doesn’t work otherwise. The reason “we don’t want anyone idle” feels like the responsible objection is that it confuses utilization with throughput. They are different variables. Optimizing the wrong one breaks the system.
”The limit is wrong”
Probably true. Almost certainly true on day one. Doesn’t matter.
The point is not to find the perfect limit. The point is to make the limit binding and adjust it deliberately as the team learns where the system actually constrains. Anderson’s starting heuristic is “two items in progress per knowledge worker” (his words; closer to industry observation than replicated finding, as the One Piece Flow piece notes when it cites the ACM/IEEE kanban study showing the relationship is more complex than Little’s Law alone suggests). Vacanti’s column-level heuristic is two-thirds to three-quarters of team size (a 9-person team uses 6). They are starting points, not findings.
Override silently and you destroy the experiment. The number is not sacred. The discipline is.
Five different objections, one shared move: stepping over the limit. The reason every one of them is wrong is the same reason: the WIP limit isn’t there to throttle the work. It’s there to surface the constraint. Two things every WIP-limits writeup underspecifies: what the limit is actually for, and what enforcement actually looks like. The next two sections take them in order.
What WIP limits are actually for
The reframe is the load-bearing move of the article: the WIP limit is not a throttle. It is a forcing function. It is how you find the bottleneck, not how you avoid it.
The clean illustration. If your “code review” column is always full and your “in progress” column is always empty, you have not failed at WIP. You have just discovered your bottleneck. Reviewers are the constraint. The limit is doing its job by making that visible. If you raise the review limit “to keep things moving,” you have not fixed the problem. You have just hidden it again.
David Anderson, on the load-bearing role of the limit in his revised statement of the kanban method:
It is the WIP limit that ultimately stimulates conversations about process problems. When WIP constraints are reached, teams must either break limits, ignore issues, or collaboratively address blockages.
That is the bridge from “what the limit is” to “what the limit does.” The conversations the team needs to have do not happen without the limit. The limit creates the friction that produces the conversation that identifies the constraint. Remove the limit and the friction goes with it, and so does the diagnosis.
Reinertsen frames the same mechanism mathematically as Principle W2, the Rate-Matching Principle. WIP constraints align input rate with output rate. That is the technical mechanism by which a pull system functions. Without the limit, the team is taking input at whatever rate it arrives, and the math goes where the math goes: toward the queue length explosion that Q3 describes.
DORA tied this to delivery performance directly. The 2015 Puppet/DORA State of DevOps report was the first year the program treated WIP limits as a measured construct in their own right, not just batch size as a proxy. The finding: “Teams that limit WIP and use those limits to drive process improvement achieve higher throughput.” Not throttle the work. Drive process improvement. That is the entire point.
The limit is also where the team gets paid back for setting it. The limit creates the conversations. The conversations identify the constraint. The constraint, once identified, is a thing you can actually work on. That is the loop the limit closes. Knowing what the limit is for is half. The other half is what enforcement actually looks like, because most teams have never seen it. Toyota has.
What enforcement actually looks like
The hard rule, stated plain: column full means you do not pull. The next thing you do is help clear the column. Not “you can pull if it’s important.” Not “you can pull with team agreement.” Not “you can pull because the PM said so this morning.” Column full means you do not pull. The limit is a wall.
The Toyota analog is jidoka. The Andon Cord article covered it; the short version is that any worker on a Toyota line can stop the line when an abnormality appears. Coworkers swarm the station; if the problem is solved within a fixed response window the line continues, if not the line stops at the end of the window for a broader swarm. The mechanism is canonical Toyota; see the Lean Enterprise Institute Lexicon entry on jidoka and Toyota’s official global site. The popular software-community paraphrase “swarm or stop” is a paraphrase, not canonical Toyota; the mechanism is Toyota-canonical.
Andon and the WIP limit are siblings. Both are stop-the-line mechanisms. The cord is the signal; the WIP limit is the constraint. Same family of decision. The Andon Cord article argued the cord is the cheap part: installing a Slack channel called #andon is the cheap part, building a culture that responds when somebody pulls is the load-bearing element. Same shape applies here. The number above the column is the cheap part. The discipline to actually stop pulling is the load-bearing element. Most teams skip it.
The awkward case. The reviewer column is full. You are in the “in progress” column. You don’t review code in this stack (maybe it’s the iOS code and you are a backend engineer, maybe it’s the data pipeline and you are a frontend engineer). Now what.
The honest answer is that you still do not pull another card. The point of the limit is not that you specifically must clear the bottleneck. The point is that the team clears it. If you can review, review. If you can pair with a reviewer, pair. If none of those work, the right move is still not to pull a new card; it is to use the time to cross-skill, write a test that prevents the next instance of this backlog, or document something that is currently tribal knowledge. What you do not do is keep activating yourself on a card the system can’t move yet. That is, in Goldratt’s vocabulary, activation without utilization. It increases queue. It does not increase throughput.
Enforcement is the entire point. A limit that gets stepped over is a number, not a limit.
The compliance and coordination objection
Enforce a limit and stakeholders get nervous. They shouldn’t. The data on what small batches do to stakeholder anxiety is the most replicated finding in twelve years of DORA research.
The objection: “We have to keep working on multiple things or stakeholders complain.” This is the same anxiety that produced “the PM is pushing,” now framed as a structural constraint instead of a personal one. The anxiety is not invented. It is just misdirected.
The data is direct. The 2023 Accelerate State of DevOps report, in the chapter on flow:
Reducing batch size of changes is the universal technique to improve all four delivery metrics.
That is the cleanest single sentence in the entire DORA corpus on this question. Universal and all four are not hedge words. Twelve years of research, surveyed across tens of thousands of practitioners, with whatever methodology caveats apply (and they apply; see the State of the State of DevOps synthesis for the full methodology context), and the headline finding on batch size is one sentence long.
Speed and stability are not a tradeoff. Small batches make stakeholders less anxious because deliveries are more predictable, not because the team got faster. The 2018 DORA report’s framing on the alternative (“Making large-batch and infrequent changes introduces risk to the deployment process. When failures occur, it can be difficult to understand what caused the problem and then restore service”) is the actual cost of the coordination objection. The “safer” path is not safer.
The 2025 update gives the 2026 reader a reason to act now. The 2025 DORA report on AI-assisted development identified seven organizational capabilities that determine whether AI adoption helps or harms team performance. Working in small batches is one of them. Fewer lines per commit, fewer changes per release, shorter task cycles correlate with higher product performance and lower friction in AI-assisted teams. The report notes the apparent tension (small batches slightly reduce the individual-effectiveness perception that comes from AI generating large amounts of code quickly), but at the team and product level the effect is unambiguous. If your AI-assisted teams are working in large batches, you are paying for AI and getting less throughput.
If the limit is binding and the math says small is better, the practical question is: where do you set the limit? Not perfectly. Deliberately.
Setting the limit
Heuristics, not formulas. The limit is an experiment, not a forecast.
Anderson’s “two items per knowledge worker” is the most-quoted starting point (the heuristic, not the finding). Vacanti’s column-level heuristic of two-thirds to three-quarters of team size is the most useful default for an “in progress” column: a 9-person team starts at a column limit of 6. Team size minus one is a defensible alternative. The exact number is wrong, almost certainly, on day one. That is fine. The number being wrong is not the failure mode. Stepping over it is.
Vacanti’s additions are worth installing alongside the limit itself. Pull from right to left: focus on the rightmost column first, since work flows by being pulled from upstream, never pushed. Anti-aging policy: when the column has multiple in-flight items, pull the oldest first; a reasonable default is “if any item is older than four days, move it.” Aging is the leading indicator that flow has broken.
Then reduce the limit until pain emerges. Pain is the signal. It is exactly the constraint surfacing. Don’t run from it; work it. Once you have it, decide deliberately whether the limit changes. Sometimes the right answer is to raise it because the team has invested in clearing the actual bottleneck. Sometimes the right answer is to leave it and keep working the constraint. Either way, the change is deliberate, and the team’s data is the input.
Vacanti documents what happens when teams remove WIP limits entirely: cycle times “increase in an identical trend to before.” The math reasserts itself, on schedule, every time. The limit is not a preference. It is the operational handle on the math.
None of this changes anything until somebody acts on it. Three roles, three actions, this week.
Monday-morning actions
If you’re an IC. Refuse the next ticket when the column is full. Be the person who says it. The math is on your side. The right script is one sentence: the column is full; let me help clear it before I pull another. Say it once and the team will say it back to you.
If you’re a team lead or agile coach. Enforce one column’s limit for two weeks. Pick one column. Don’t touch the others. Watch what surfaces. Resist (actively resist) the urge to raise the limit because the team is uncomfortable. The discomfort is the data. The pain is the constraint, becoming visible. Same energy as the Andon Cord IC action: pull a cord visibly, especially the personal one, especially in front of a team that has never seen you do it. The behavior is faster to install than the explanation.
If you’re a VP or director. Stop measuring “in-flight initiatives” as a count. It is the wrong number, and it is rewarding the wrong behavior. Replace it with cycle time and throughput at the team and value-stream level. The 2025 DORA finding on small batches as an AI capability makes this urgent: if your AI-assisted teams are working in large batches because the executive scorecard rewards “more initiatives in flight,” you are paying for AI and getting less throughput, against the published finding that says exactly that will happen.
The next article in Four on the Floor asks the question this article only opens. Once the WIP limit surfaces the constraint, what do you actually do with it? That’s Theory of Constraints. It closes the quartet.
The honest summary
A loom in 1896 stopped itself when a thread broke. A team of six on a Wednesday morning does or does not enforce the number above the column. The cord is the cheap part. The number is the cheap part. The decision to actually stop pulling, that’s the load-bearing element, and most teams skip it.
Little’s Law isn’t aspirational. Reinertsen’s queue isn’t a metaphor. DORA’s twelve years of data are unambiguous. Reduce WIP, cycle time falls. Step over the limit, and you have not made the team faster; you have hidden the bottleneck and traded it for a longer queue. The limit is a wall or it is nothing.
The next article in Four on the Floor asks the obvious follow-up. Once the WIP limit surfaces the constraint, what do you actually do with it? Goldratt has been waiting his turn. We’ll close out the quartet there.
Sources
- Little, John D.C. “A Proof for the Queuing Formula: L = λW.” Operations Research, Vol. 9, No. 3, May–June 1961, pp. 383–387. The original paper. The mathematical floor of every WIP/cycle-time argument.
- Little, John D.C. “OR FORUM: Little’s Law as Viewed on Its 50th Anniversary.” Operations Research, Vol. 59, No. 3, May–June 2011, pp. 536–549. Little’s own restatement: weakened assumptions, software-applicable.
- Reinertsen, Donald G. The Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas, 2009. The synthesis treatise. Q3 (utilization → exponential queue) and W2 (rate matching) are the principles cited here.
- Kingman, John F.C. “The single server queue in heavy traffic.” Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 57, No. 4, 1961. The math behind the 85% breakpoint. See AllAboutLean’s summary for the practitioner-level walkthrough.
- Anderson, David J. Kanban: Successful Evolutionary Change for Your Technology Business. Blue Hole Press, 2010. The software adaptation. Six general practices, with Limit WIP as the second and load-bearing one. See also the revised practices essay at DJAA for the “stimulates conversations” quote.
- Goldratt, Eliyahu M. and Jeff Cox. The Goal: A Process of Ongoing Improvement. North River Press, 1984 (4th ed. 2014). “Activating a non-bottleneck to its maximum is an act of maximum stupidity,” p. 211. The next Four on the Floor article will do the deeper Theory of Constraints work.
- Vacanti, Daniel S. Actionable Agile Metrics for Predictability: An Introduction. ActionableAgile Press, 2015 (10th anniversary update 2025). The five-constraint precise version of Little’s Law. “If you have a problem with predictability, you have a problem with WIP.”
- Vacanti, Daniel S. “Don’t just limit WIP, optimize it.” ProKanban, January 2021. The 2/3 to 3/4 heuristic, the anti-aging policy, the right-to-left pull rule, the math-reasserts-itself observation when WIP limits are removed.
- Brown, Paul. “WIP: What It Is, What It Isn’t, and Why It Still Matters.” ProKanban, May 2025. Vacanti’s restatement: WIP is a system condition, not a number above a column.
- Kersten, Mik. Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework. IT Revolution Press, 2018. The Flow Framework. Tasktop’s flow-efficiency data: 15–40% across 3,600+ teams.
- Magennis, Troy. “How Does Utilization Impact Lead-time of Work?” Observable notebook. The interactive visualization of Kingman’s formula in software-team terms.
- Poppendieck, Mary and Tom. Lean Software Development: An Agile Toolkit. Addison-Wesley, 2003. The 2003 statement of Little’s Law for software audiences and the 80% utilization breakpoint that Reinertsen later refined to 85%.
- Lean Enterprise Institute: Jidoka Lexicon entry and the Toyota Motor Corporation official global site. The canonical Toyota mechanism behind the cord-and-swarm response.
- Kahneman, Daniel and Amos Tversky. “Intuitive Prediction: Biases and Corrective Procedures.” Management Science, 1979. The planning fallacy, named.
- Buehler, Roger; Griffin, Dale; Ross, Michael. “Exploring the ‘Planning Fallacy’: Why People Underestimate Their Task Completion Times.” Journal of Personality and Social Psychology, 1994. The 1994 replication and extension.
- DORA / Accelerate State of DevOps Reports, 2014–2025. Specifically cited in this article: 2015 Puppet/DORA (WIP limits enter the construct), 2018 DORA (large-batch risk quote), 2021 Puppet (risk-aversion paradox), 2023 DORA (the universal-technique quote), 2025 DORA on AI-assisted development (small batches as one of seven AI capabilities). Methodology context: State of the State of DevOps and DORA Metrics: What the Research Actually Says.
For the rest of the Four on the Floor quartet, see One Piece Flow in Software Delivery and The Andon Cord in Software Teams. The closing piece on Theory of Constraints is forthcoming.