MergePilot Blog
mergepilot.app
Triage Before You Review: The Skill That AI Made Critical
Blog/Triage Before You Review: The Skill That AI Made Critical
code-reviewdeveloper-productivityaivibe-codingengineering

Triage Before You Review: The Skill That AI Made Critical

A
Aloisio Mello
6 min read
Developer thinking before reviewing code

There's a moment every developer knows. You open a pull request, start reading line one, and 40 minutes later you realize you've been reviewing the wrong thing. The blast radius is 3x what the description said. The architecture decision buried in file 7 invalidates everything you just annotated.

That moment has a name: it's a triage failure. And in 2026, it's the most expensive failure in software development.

Two Phases, One Mistake

Code diff on monitor

Code review has always had two distinct phases that most teams conflate into one:

Phase 1 — Triage: Is this PR worth careful review? What's the actual scope? What's the blast radius? Which files carry the real risk? How long will this actually take?

Phase 2 — Review: Line-by-line analysis, logic correctness, edge cases, style consistency.

Phase 2 gets all the tooling attention. Linters, static analysis, AI comment generators — the entire industry is building better Phase 2 tools. Phase 1 gets a description field that the author writes and nobody audits.

This was always a gap. AI-generated code turned it into a crisis.

What Vibe Coding Did to the Review Queue

Developer analyzing code on screen

The data from Q1 2026 is stark:

  • 98% more PRs merged per developer per week compared to 2024 (GitLab Engineering Velocity Report)
  • 91% increase in PR review time per engineer (Hatica Developer Productivity Index, Q1 2026)
  • 1.7x higher defect rate in AI-generated code versus human-authored code (same study)
  • Code review fatigue is now the #3 driver of developer burnout, up from #7 in 2024

The throughput went up. The quality signal went down. Review time went up. Something has to give — and right now, what's giving is reviewer attention.

Developers are doing what humans do under sustained cognitive load: they're pattern-matching instead of reading. They're approving on vibe. They're skimming the files they understand and skipping the ones that look unfamiliar. This is rational behavior under irrational conditions.

The conditions are: AI writes code faster than humans can review it. That's not going to change.

Why AI-Generated PRs Break Your Triage Heuristics

Before AI-generated code, developers had reliable heuristics for triage:

  • Clean syntax = probably fine to skim
  • Detailed PR description = author thought it through
  • Small diff = low risk
  • No obvious errors in the first file = good sign

AI-generated code breaks every one of these. The syntax is always clean. The description is always detailed (and usually AI-generated too). The diff can be small and still touch a critical path. There are no obvious errors in file 1, but the logic error is in file 4, which is a consequence of an architectural assumption in file 7.

The traditional triage heuristics were shortcuts that worked because human code had predictable error signatures. AI-generated code has different error signatures — and your existing instincts are calibrated to the wrong data.

The Five Questions That Actually Matter

Developer at standing desk making decisions

Effective triage answers five questions before you open a single diff:

  1. What is the actual blast radius? Not what the description says — what do the changed files actually touch? Which downstream systems depend on them?
  2. Where is the highest-risk code? Authentication, payment processing, data migrations, public APIs — these need careful review regardless of diff size.
  3. Does anything here break existing contracts? API signatures, database schemas, interface implementations — breaking changes that downstream callers depend on.
  4. How long will careful review actually take? SmartBear's research established 400 lines of code as the upper cognitive limit for effective review. At 400+ LOC, defect detection drops sharply. You need to know this before you start.
  5. Is the description accurate? For AI-generated PRs specifically: does what the code does match what the description says it does? These diverge more often than you'd expect.

These five questions take 5 minutes when done manually. They can take 30 seconds with the right tooling. Either way, they save the 40-minute wrong-direction review.

The 400-Line Finding

The SmartBear finding deserves its own section because it's counterintuitive and developers consistently underestimate it.

At 400 lines of code reviewed in a single session, defect detection rates drop sharply — not gradually. It's not that you get slightly worse at reviewing large PRs; it's that cognitive load crosses a threshold and your ability to hold the full context collapses. You start seeing trees and losing the forest.

AI-generated PRs routinely exceed 400 LOC. They're comprehensive, they handle edge cases, they add tests, they update documentation. A 1,200-line AI-generated PR is not three 400-line PRs — it's one PR where effective review requires deliberate decomposition.

Knowing this before you start reviewing changes what you do. You break the review into sessions. You prioritize which 400 lines carry the most risk. You explicitly decide what you're going to skim and what you're going to read carefully — rather than discovering this implicitly, 40 minutes in, when you're already fatigued.

Tooling That Helps

Manual triage at scale doesn't work. With 98% more PRs per week, spending 5 minutes on triage per PR becomes a non-trivial part of the review budget. The answer isn't to skip triage — it's to automate it.

The tools that help are the ones that answer the five questions before you open the diff: blast radius, risk concentration, breaking changes, estimated review time, description accuracy. Not tools that generate more comments on the diff you've already opened.

This is what MergePilot does — pre-review PR analysis that runs locally on macOS. It maps the blast radius, flags high-risk files, identifies breaking changes, and estimates review time before you commit to a full review. No code leaves your machine.

The insight behind it is simple: the most valuable moment in code review is the 60 seconds before you start. That's when you decide whether to review carefully or skim. Right now, most developers make that decision with the PR description as their only input. That's not enough.

The Skill That Compounds

Triage is a skill. Like all skills, it compounds. Developers who triage well catch more defects in less time, accumulate less review fatigue, and maintain their ability to do careful review later in the week when it matters most.

The developers who skip triage burn their cognitive budget on PRs that didn't need it, arrive at the high-risk PRs already fatigued, and approve on vibe. This is how defects ship.

In 2026, with AI writing code faster than humans can review it, the developers who triage well are the ones who stay effective. It's not about being smarter or working harder. It's about spending the first 60 seconds correctly.

The code review queue is not going to get shorter. The question is whether you'll be the one deciding how to route it, or whether it'll route you.

Comments