The Validation Gap Is Costing You More Than You Think

Our latest State of Software Delivery report analyzed more than 28 million CI workflows and found a pattern that should give engineering leaders pause. Average throughput grew 59% year over year. Main branch activity for the median team declined 7%. Teams are generating more code than ever before. Less of it is reaching production.

The cost of poor validation used to show up mostly in developer hours: debugging, blocked deployments, context switching. That cost hasn’t gone away. But there is a second bill now. Every failed build means agent retries. Every slow pipeline is compute burning while an agent waits. Main branch success rates have fallen to a five-year low of 70.8% against a 90% benchmark, and the AI spend attached to every failed cycle is climbing alongside it.

The teams doing well are catching failures earlier and keeping their pipelines healthier. They are running the same tools as everyone else. What they have structured differently is where and when validation happens.

For most of our careers, the inner loop was something that was managed and optimized by the developer. Once AI generated code entered the picture, the inner loop as we knew it could not handle the volume. Today, the inner loop is agentic. It’s where the agent is actively working: writing, iterating, checking before anything is committed or pushed. The outer loop is CI: shared infrastructure, integration, the final gate before shipping. Most teams have invested heavily in the outer loop. Until recently, the inner loop didn’t need much attention. That has changed.

CI was designed around a human pace of development: one engineer, one branch, one push at a time. Agents generate changes in parallel, across multiple tasks, at a volume that makes the push-wait-fix cycle a serious drag on throughput. By the time CI returns feedback, the agent has moved on to the next task. Context is gone. Fixing the failure means starting a new cycle: reloading context, re-examining the change, potentially redoing work that was already completed.

Human review was always the backstop before code reached shared infrastructure. For most teams today, the volume of AI-generated change has simply outpaced what any reviewer can meaningfully assess before it hits CI. The code arrives faster than the review process was designed to handle. Most teams end up making a choice without fully realizing it: either throttle the agents to match available review capacity, or let the volume through and absorb the failures downstream.

Validation needs to happen earlier, while the agent is still working on the change. CI still matters. System-level validation, integration, packaging and deployment belong in the outer loop and always will. But by the time code reaches CI, it should have already passed basic scrutiny. The inner loop is where that confidence gets built, before anything touches shared infrastructure. Getting validation right at this stage is also how teams close the gap between a 70.8% success rate and the 90% benchmark the data points to.

The requirements for inner loop validation are specific to how agents work. Feedback has to arrive within the window the agent is still operating in. Tests need to be scoped to the relevant change, not the full suite. Failures should surface one at a time: an agent given a long list of problems fills its context window and stops being productive. These constraints are different from what CI was designed to satisfy, which is why existing tooling often doesn’t fit.

When the two loops share context, the picture changes. When the inner loop draws on what CI has historically flagged in a codebase, it runs smarter checks before anything is pushed. When CI sees changes that have already been validated locally, it can focus on what actually needs system-level verification. The two stages inform each other and the system gets better at catching the right things over time.

By the time CI returns a failure, is the agent that introduced it still working on that change?

This is where CircleCI has been focusing, building validation that spans both stages and learns from how a codebase actually builds. The infrastructure question is one teams will need to answer if they want to improve their throughput and success rates.

Every model release increases agent velocity and the volume of code flowing into delivery pipelines. Teams that have solved the validation gap absorb that increase and get faster. Teams that haven’t will find the gap between them and the top performers widening with every upgrade.

The Validation Gap Is Costing You More Than You Think

From AI Hype to AI Assurance: How Engineering Teams Can Safely Ship AI-Enabled Software

GitHub’s Redesigned PR Inbox Tackles the Review Bottleneck AI Created

IBM Bob Gets Multi-Agent Muscle and a Cost Dashboard for Enterprise Coding

The Validation Gap Is Costing You More Than You Think

Related Posts

From AI Hype to AI Assurance: How Engineering Teams Can Safely Ship AI-Enabled Software

GitHub’s Redesigned PR Inbox Tackles the Review Bottleneck AI Created

IBM Bob Gets Multi-Agent Muscle and a Cost Dashboard for Enterprise Coding