Technology

AI Code Review Only Catches Half of Your Bugs—Fix the Intent

AI code – AI tools can miss bugs when code “looks right” but violates the real purpose. Requirements that capture intent—what must not happen—are key.

AI code review can be impressive—until it confidently verifies the wrong thing.

That gap is what made a recent. very human moment sting: I tried to “vibe code” a simple bus-tracker app and let an AI build the logic end to end.. It produced a clean web UI. parsed the transit API response flawlessly. and even suggested a departure time that matched my on-foot schedule.. But when I stepped outside. the bus never showed up—because my app was tracking the route in the opposite direction.. On paper, nothing about the code looked broken.. In practice, the app was doing the wrong job.

The incident wasn’t just an inconvenience in Park Slope.. It was a precise illustration of a bigger engineering problem: many bugs aren’t “in” the code in a way static analysis can see.. They’re in the mismatch between what the system was supposed to do and what it actually does.. Structural checks—syntax correctness. typical safety patterns. JSON parsing. even many “does this run?” validations—can all pass while the software violates the real intent.. That’s the blind spot AI code review often inherits when requirements are treated as optional rather than foundational.

In the author’s broader work on agentic engineering and AI-driven development. the argument is straightforward: if AI can only verify what you explicitly give it. then the quality of verification depends on the quality of the “why.” Spec-driven development (SDD) has helped teams formalize implementation details—things like retry mechanisms. paginated endpoints. or “duplicate key checks.” But SDD tends to focus on the how.. For code quality. and especially for security. what you really need is the why: the behavior tied to purpose. users. and the consequences of failure.

This matters because structural analysis and pattern detection hit a ceiling.. Security research has long pointed out that a large share of vulnerabilities don’t come from obvious implementation flaws like injection patterns.. Instead. they come from design-level intent violations—cases where the system simply fails to enforce a security property because no one ever wrote down the requirement that would have forced that enforcement.. The result is that an endpoint might validate inputs. build queries safely. and still allow destructive actions because the authorization rule was never specified as a requirement worth checking.

Misryoum readers will recognize the real-world version of this.. When teams rely on vibes. institutional memory. or scattered notes. the system can ship with “correct-looking” behavior that doesn’t cover the boundaries that actually protect users.. A missing authorization check doesn’t announce itself in the code like a typo does.. It looks like normal logic—until a test. a penetration attempt. or a breach exposes that the software never had an enforceable rule for what it must prevent.

In the bus app. the problem was essentially an intent requirement that wasn’t captured: I didn’t want “any bus that matches the stop identifier.” I wanted a specific direction for a specific destination.. The AI assembled something structurally valid from the API and UI logic, but it couldn’t infer which purpose mattered.. Without that purpose, the review process becomes a checklist of correctness rather than verification of outcomes.

The most useful shift in approach is moving from “specs” (implementation instructions) to “requirements” (purpose-backed guarantees and prohibitions).. A requirement doesn’t just say what to implement; it describes what users depend on and what must never be allowed.. That difference is what enables AI to evaluate edge cases that structural reviewers won’t reach. because those edge cases aren’t about code form—they’re about behavioral contracts under real conditions.

The Quality Playbook described here proposes a practical method to make this work with AI tooling.. Many teams try to ask an AI to do two hard jobs at once: infer behavioral contracts and then write requirements for them.. Attention runs thin.. The approach instead separates the pipeline.. First, have the AI observe and list behavioral contracts from the code and surrounding documentation.. Next, derive requirements from those contracts plus the “why” that lives in design context.. Then check coverage—whether every contract is actually backed by a requirement—and treat uncovered contracts as visible gaps to close.. The key idea is that “external memory” prevents the AI from forgetting observations it previously noticed. which is exactly where many verification workflows silently fail.

There’s a broader lesson for anyone using AI in development right now: requirements hygiene is becoming a security feature. not just a project-management exercise.. If your intent is trapped in chat threads, issue discussions, and support tickets, AI can’t verify it reliably.. But if you extract that intent into positive requirements (“users depend on X to reject ambiguous input”) and negative requirements (“ambiguous input must not be silently accepted. ” “unauthorized users must not delete other users’ data”). you give the model something concrete to test against.. For teams, that’s the difference between an AI that accelerates implementation and an AI that meaningfully reduces defect risk.

Looking ahead. the most effective AI-assisted code review workflows are likely to evolve beyond “linting with language models” toward “intent-aware verification.” That doesn’t replace structural analysis—it complements it.. But it changes what teams measure success by: fewer intent violations. more security boundaries enforced. and fewer cases where code passes checks yet fails the job it was supposed to do.. If the AI can only catch what you tell it to care about. then getting the “care” right—through requirements—is the lever that moves outcomes.