How do you review AI-generated pull requests safely?

Run the PR against a fixed checklist before reading the diff. Cover behaviour, dependencies, types, and surface area in that order. Treat anything outside the prompt's stated scope as a red flag, not a bonus.

What should you look for in AI-generated React or Vue code?

Hallucinated APIs, silent prop or emit renames, new packages added without justification, and any() or as casts that paper over a real type mismatch. In Vue, also watch for defineModel and Pinia store shape drift; in React, watch for hook dependency arrays that hide stale closures.

Can you automate the review of AI-generated code?

Parts of the checklist are automatable today: dependency diffs, bundle size deltas, lockfile drift, and type-coverage regressions. The behaviour and surface-area items still need a human, because they depend on intent, not syntax.

What is the biggest risk of merging AI-generated code without review?

Silent surface-area expansion. The agent often touches files outside the prompt, renames public exports, or adds a dependency to solve something the codebase already solves. Each one compounds over weeks.

AI code review vs human code review, what is different?

Human reviewers assume the author intended every change. With AI-generated code, that assumption breaks. You review the prompt and the diff together, and you treat scope drift, hallucinated APIs, and casual any() as the default failure modes rather than rare slips.

AI Code Review Checklist for React and Vue Teams

Reviewing AI-generated code is a different discipline from reviewing human code, and an AI code review checklist closes the gap. Use a 12-item pre-merge checklist organised in four bands (behaviour, dependencies, types, surface area) for React 19.x and Vue 3.5.x teams in 2026, plus a paste-ready PR-template snippet. The agent does not share your assumptions about scope, so the diff often answers a question you did not ask.

TL;DR, the 12-item AI code review checklist

The recommendation is short: put the checklist into your pull-request template, attach an ai-assisted label to any PR where an agent wrote more than a few lines, and require every item ticked before merge. Each line is a yes/no question scoped so a mid-level reviewer can apply it without escalating.

Does the diff implement exactly what the prompt asked for, and nothing else?
Does every imported library function actually exist in the installed version?
Are the edge cases the prompt named (empty, error, loading, boundary) all handled?
Are side effects placed where the framework expects them (effects, watchers, event handlers)?
Is every new package justified, and would an existing helper do the job?
Does the lockfile diff contain only the changes the new dependency requires?
Has the route or page bundle grown by less than the team's gzip threshold?
Are there zero new any, unknown, or as casts that silence a real type error?
Do client and server types still agree on the shapes the diff touches?
Does the diff avoid renaming or removing any public export the prompt did not ask about?
Are there zero dead branches, if (false) guards, or "for safety" second code paths?
If the diff touches authentication, payments, or file uploads, has a second human reviewed it?

Paste the list into .github/PULL_REQUEST_TEMPLATE.md and the rest of this article walks through each item with concrete React and Vue examples.

Human review focus vs AI-assisted review focus

The bands above shift the reviewer's attention away from style and toward intent. A side-by-side table makes the contrast explicit.

Review focus	Human-authored PR	AI-assisted PR
Primary failure mode	Misunderstanding the requirement	Drifting outside the prompt's scope
Trust default	Assume intent behind every change	Assume nothing outside the prompt was intentional
API calls	Spot-check unfamiliar ones	Verify every external API against the installed version
Dependencies	Discuss when a new package appears	Require justification by default
Type casts	Read for correctness	Grep the diff, treat each cast as guilty
Surface area	Trust the author's restraint	Diff the public exports explicitly
Dead branches	Rare, often deliberate	Common, almost always accidental

The shift is uncomfortable at first because it feels adversarial. It is not. The agent has no skin in the game, so the reviewer has to bring the skin.

Why reviewing AI code is its own discipline

The failure mode that breaks traditional review is the assumption of intent. When a colleague renames an export, you assume they thought about the downstream consumers. An agent doing the same thing probably did not, because the prompt never mentioned them. Every review heuristic built on "the author thought about this" stops working.

We have audited teams where the same agent, prompted twice with the same task, returned two diffs that differed in scope by 40 percent. One touched two files. The other touched eleven. The reviewer who waved the eleven-file version through caused three weeks of downstream cleanup, because half the changes solved problems nobody had asked the agent to solve.

The fix is structural. A written checklist forces the reviewer to look at the same set of failure modes every time, in the same order, regardless of how the diff is shaped. The four bands (behaviour, dependencies, types, surface area) move from cheapest to most expensive to apply. Behaviour catches the embarrassing bugs in a minute. Surface area catches the long-tail bugs that erode a codebase over a quarter.

The other change worth making early is the contract with the agent itself. A written instruction file (Cursor rules, CLAUDE.md, or the equivalent for whichever tool the team uses) is the analogue of an onboarding document for a new human teammate. The fewer surprises the agent generates, the fewer the reviewer has to catch.

Band 1: behaviour, does it do the right thing for the right reasons

This band asks whether the diff implements the prompt, not whether it compiles. It is the cheapest band to apply and catches the most embarrassing bugs, because most behaviour failures show up in the first few seconds of running the file or in a single careful reading.

Check 1: hallucinated APIs and library functions

Agents are pattern-matchers. When asked to call a library, they sometimes return a method that sounds plausible but does not exist on the installed version, or that was removed two majors ago. Spot it by searching the library's documentation, then node_modules, then running the file.

src/api/posts.ts

import { useQuery } from '@tanstack/react-query'

export function usePost(id: string) {
  return useQuery({
    queryKey: ['post', id],
    queryFn: () => fetch(`/api/posts/${id}`).then((r) => r.json()),
    onSuccess: (data) => {
      console.log('loaded', data.title)
    },
  })
}

The onSuccess callback was removed from useQuery in TanStack Query v5. Agents reach for v4 muscle memory often, and TypeScript accepts the call because the options type allows extra keys in some configurations. The PR comment to leave: "TanStack Query v5 dropped onSuccess/onError from useQuery. Move the side effect into a useEffect keyed on data, or use the query's select if you only need a transform." That comment names the failure, the version, and the fix in one line, which is the shape every AI-review comment should take.

The Vue equivalent is the same shape. An agent asked to add reactive state inside a composable sometimes returns reactive() for a primitive value, which silently does nothing. Verify the API exists on the imported package version, not on the version the agent remembers.

Check 2: off-by-one and edge-case omissions

Agents skew toward the happy path. The prompt says "paginate the list" and the diff handles pages one through N but silently breaks on page zero or an empty result. The fix is to read the prompt and the diff side by side and ask: which boundaries were named, and which were ignored?

src/components/PaginatedList.tsx

import { useEffect } from 'react'
import { usePosts } from '../hooks/usePosts'

type PaginatedListProps = {
  page: number
}

export function PaginatedList({ page }: PaginatedListProps) {
  const { data } = usePosts(page)

  useEffect(() => {
    document.title = `Posts page ${page}`
  }, [])

  return (
    <ul>
      {data?.items.map((post) => (
        <li key={post.id}>{post.title}</li>
      ))}
    </ul>
  )
}

The useEffect dependency array is empty, so the title only updates on mount. What the agent shipped matches the prompt (a page-aware title) but ties it to a render-once side effect that captures the initial page value forever. The PR comment: "Dependency array should be [page]. Right now the title sticks on whatever page first rendered." A junior reviewer can apply that check by scanning every effect and confirming each closed-over variable is in the deps list.

While we are here: empty results, error states, and loading states are the three edge cases agents skip most often. If the prompt mentioned them, the diff must handle them visibly. If the prompt did not, the comment is "what does this render when data is undefined and error is set?"

Check 3: side effects in unexpected places

Both React and Vue have firm rules about where side effects live. Render functions read state; effects mutate it. Computed properties derive values; watchers react to changes. Agents bend these rules to satisfy a prompt quickly, and the bug surfaces hours later as a console warning or, worse, a silent loop.

src/composables/useCartSummary.ts

import { computed } from 'vue'
import { useCartStore } from '../stores/cart'

export function useCartSummary() {
  const cart = useCartStore()

  const total = computed(() => {
    const sum = cart.items.reduce((acc, item) => acc + item.price * item.quantity, 0)
    cart.lastSummaryRequestedAt = Date.now()
    return sum
  })

  return { total }
}

The computed mutates the Pinia store while deriving the total. Pinia will complain in dev mode, but the bigger problem is that any component reading total triggers a store write, which triggers any watcher subscribed to lastSummaryRequestedAt, which can re-render the cart, which recomputes total. The PR comment: "Computed values must be pure. Move the timestamp write into the action that triggered the summary request." If the prompt asked the agent to track when summaries were requested, the right home for that is the action, not the getter.

The React analogue is fetching during render. An agent asked for a "fresh dashboard" sometimes writes const data = await fetch(...) directly inside the component body, which throws in production and works only in the half-broken way Suspense pretends it does. Effects, event handlers, and route loaders are the only homes for fetches; the diff that fetches in render needs a one-line comment pointing to the right home. For a worked example of intentional reactive plumbing across multiple props, see the article on syncing multiple v-models with Composition API.

The behaviour band is the cheapest of the four because every check fits on one screen of code. If the reviewer cannot answer all three checks for a hunk in two minutes, the hunk is too big and the PR needs splitting.

Band 2: dependencies and footprint

The second band catches the silent supply-chain creep agents introduce when they reach for a familiar package instead of the project's existing helpers. Dependencies are forever in a way most other code is not, and "one small library" compounds across dozens of PRs into a measurably heavier bundle and a measurably wider attack surface.

Check 4: new packages added without justification

The first question for any package.json diff is "could we do this with what we already have?" Agents do not naturally inventory the existing utilities, so they reach for lodash.debounce, date-fns, axios, or whichever package they have seen most often, regardless of whether the project already ships an equivalent.

package.json (diff)

   "dependencies": {
     "react": "19.2.6",
     "react-dom": "19.2.6",
+    "axios": "1.7.9",
     "@tanstack/react-query": "5.62.0"
   }

The project already uses fetch wrapped in a typed client. Adding axios to make one request is two new transitive dependencies, an extra 13 KB gzip, and a divergence in error-handling shape that future maintainers have to learn. The PR comment: "We have a typed fetch wrapper at src/api/client.ts that already handles retries and error shape. Use that instead." A useful sanity check is to drop the package name into Bundlephobia and check the gzip size and the dependency count before the PR even lands in your inbox.

The Vue equivalent often involves state libraries. An agent adds Pinia ORM or a query cache because the prompt mentioned caching, while the codebase already has a clean Pinia store doing the job. The check is the same: name the existing helper in the comment. We covered the underlying decision pattern in the article on picking between Zustand and Redux Toolkit; the same restraint applies before agreeing to a new dependency.

Check 5: version drift in the lockfile

A clean dependency add updates one line in the lockfile. An agent that ran npm install instead of pnpm install, or that upgraded a package alongside its sibling, can update hundreds of lines, including transitive bumps that the prompt did not ask for. Each one is a behaviour change in code the reviewer has not read.

pnpm-lock.yaml (suspicious diff)

   '@tanstack/query-core':
-    specifier: 5.62.0
+    specifier: 5.66.4
     version: 5.66.4

The PR was meant to add a new mutation. Instead, the lockfile shows the core query package bumped four minor versions, which means new cache behaviour, new dev-warnings, and possibly new bugs. The PR comment: "Lockfile shows @tanstack/query-core jumped from 5.62 to 5.66. Was that intentional? If not, please revert the lockfile and re-install with --frozen-lockfile." That comment treats the lockfile as code, which is what it is.

The blast radius of a casual lockfile bump is hard to overstate. We have seen one agent-run install invalidate a Sentry source-map upload because a transitive source-map-support minor bump changed its hashing. Three days of bug reports later, the team added a lockfile policy: lockfile changes outside the targeted package require a paragraph in the PR description.

Check 6: bundle size delta

Frontend dependencies are paid by every visitor. A 30 KB gzip increase on a route-level chunk is worth questioning. A 100 KB increase is worth blocking. CI bundle-size budgets enforce this automatically (Nuxt's analyze mode and Vite's rollup-plugin-visualizer both surface the diff), but in the absence of automation the reviewer runs pnpm build locally and compares.

.github/workflows/bundle.yml (excerpt)

- name: Build and report bundle size
  run: |
    pnpm build
    npx size-limit --json > size.json
    node scripts/compare-bundle.js size.json

A scripted check turns the conversation from "did the bundle grow" to "the bundle grew by 47 KB; here is the diff of which chunks". The PR comment then writes itself: "Route /checkout grew from 142 KB to 189 KB. The new dependency is responsible. Could we lazy-load it on submit, or pick a smaller alternative?" Without the script, the reviewer has to ask the agent's collaborator to run the build, which they often will not.

The dependency band is the most automatable of the four, which is what makes it the easiest band to retire from the manual checklist once CI catches up.

Band 3: types and contracts

TypeScript is where agent-generated code most often paints over a real mismatch with a cast. This band is short but high-signal: two checks, both grep-friendly, both worth treating as guilty until the diff proves them innocent.

Check 7: any, unknown, and as casts

Each new any, unknown, or as in the diff is a silenced compiler. Sometimes the silencing is correct (narrowing at a boundary the codebase controls, asserting after a typeof check). More often it is the agent giving up on a real type error and moving on.

src/api/users.ts

import type { User } from '../types/user'

export async function fetchUser(id: string): Promise<User> {
  const response = await fetch(`/api/users/${id}`)
  const json = await response.json()
  return json as User
}

The as User cast asserts a shape the runtime has not validated. If the server starts returning id as a number, the type system says the code is fine and the UI crashes the first time it renders user.id.toUpperCase(). The PR comment: "Replace the as User with a runtime parse, e.g. userSchema.parse(json) using the Zod schema we already have at src/schemas/user.ts. The cast hides a real failure mode at the network boundary." A grep recipe for the diff makes this check fast: git diff main -- '*.ts' '*.tsx' '*.vue' | grep -E '\s(as\s+[A-Z]|: any\b|: unknown\b)'.

The acceptable use of as is narrow. Casting as const to widen literal inference, casting after a runtime check that the type system cannot follow, and casting at a boundary the team owns are all fine. Casting the result of JSON.parse or fetch().json() to a domain type is not, because the runtime cannot honour the promise.

Check 8: schema drift between client and server

The most insidious type bug is the one where both sides compile cleanly but disagree. A server renamed customer_email to email, the agent updated the React side, and the Vue admin app still expects the old field. Both type-check. Neither works.

src/schemas/order.ts

import { z } from 'zod'

export const orderSchema = z.object({
  id: z.string(),
  email: z.string().email(),
  totalCents: z.number().int(),
})

export type Order = z.infer<typeof orderSchema>

src/stores/order.ts (Vue)

import { defineStore } from 'pinia'
import { orderSchema, type Order } from '../schemas/order'

export const useOrderStore = defineStore('order', {
  state: () => ({
    current: null as Order | null,
  }),
  actions: {
    async load(id: string) {
      const response = await fetch(`/api/orders/${id}`)
      const json = await response.json()
      this.current = orderSchema.parse(json)
    },
  },
})

Both the React component and the Vue store share orderSchema as the single source of truth, parsing the response before storing it. The PR comment when the agent introduces a parallel type definition: "We already export Order from src/schemas/order.ts. A second type definition will drift. Import the existing one." Whenever a diff includes both client and server changes, line them up: the same field name, the same casing, the same nullability, on both sides.

The two checks above are easy to grep for and easy to teach. Together they catch most of the type-shaped bugs an agent will ship.

Band 4: surface area

The fourth band is the one most reviewers skip and the one that causes the worst long-tail bugs. Surface area means everything the PR exposes beyond the prompt's stated goal: renamed exports, new public files, dead branches, and security-sensitive code paths. The diff for these checks is often short. The cost of missing them is not.

Check 9: public API changes the prompt did not ask for

An agent asked to "update the modal to support a confirmation step" sometimes renames the Modal component to BaseModal while it is in the file, because the agent has seen that naming convention more often. Every downstream import breaks. TypeScript catches the obvious cases, but barrel files, dynamic imports, and JSX strings can hide the rename until production.

src/components/Modal.tsx (diff)

-export function Modal({ open, onClose, children }: ModalProps) {
+export function BaseModal({ open, onClose, children }: ModalProps) {
   return open ? <dialog onClose={onClose}>{children}</dialog> : null
 }

The PR comment: "The prompt asked for a confirmation step. Renaming Modal to BaseModal is out of scope and breaks every consumer. Revert the rename or open a separate PR with the migration plan and a codemod." The rule generalises: any rename of an exported symbol that was not in the prompt is a separate PR.

The Vue equivalent is renaming a composable's return shape, e.g. switching { user, login, logout } to { currentUser, signIn, signOut }. Every component that destructured the old names breaks at runtime if the codebase uses dynamic property access, and the type errors can be hard to map back to the rename across a large diff. For patterns that intentionally widen a component's public surface (children rendering, slot-style injection), the article on JSX injection via the Context API walks through how to do it on purpose; an agent's accidental rename is not that.

Check 10: dead branches and unreachable code

A common pattern: the agent adds a second code path "for safety" or wraps the new logic in if (process.env.NODE_ENV !== 'production') because the prompt mentioned production caution. Dead branches age into confusion. Six months later, nobody remembers whether the branch was disabled on purpose or by mistake, and the new feature ships with the safety net silently removed.

src/utils/featureFlags.ts

import { config } from '../config'

export function isCheckoutV2Enabled(userId: string): boolean {
  if (config.env === 'staging') {
    return true
  }
  if (false) {
    return rolloutPercentage(userId) < 25
  }
  return false
}

The if (false) branch is dead, and rolloutPercentage is now unused. A useful PR comment: "Remove the if (false) branch and either delete rolloutPercentage or wire it up. Dead code looks like an in-progress migration to the next reader." The same rule covers // TODO: enable this later, commented-out blocks left in the diff, and unused imports added "just in case".

Check 11: security-sensitive surfaces (auth, payments, file uploads)

This is the one non-negotiable check. Any diff that touches authentication, payment processing, or file uploads requires a second human reviewer, regardless of the diff size. Agents are not reliable on the threat models behind these surfaces, and the consequences of a wrong call here are not bugs but incidents.

The minimum check covers two of the OWASP Top Ten by name. Broken Access Control (A01) is the failure mode where the diff added a new route or API handler without re-checking the user's permission. Injection (A03) is the failure mode where the agent built a SQL or HTML string by concatenation because the prompt asked for "a query that filters by name".

src/server/handlers/uploadAvatar.ts

import { writeFile } from 'node:fs/promises'
import { join } from 'node:path'
import type { Request, Response } from 'express'

export async function uploadAvatar(req: Request, res: Response) {
  const { filename, data } = req.body
  const path = join('./uploads', filename)
  await writeFile(path, Buffer.from(data, 'base64'))
  res.json({ path })
}

The handler trusts the client to name the file. A request with filename = '../../etc/passwd' writes outside the uploads directory. The PR comment: "Path traversal via filename. Strip directory separators, generate the on-disk name server-side, and verify the resolved path stays inside ./uploads with path.resolve and a prefix check. Also enforce a size and MIME-type limit." Even a one-line handler that touches uploads earns this comment.

The rule has a downstream benefit: it slows down the diffs that need slowing. When the team learns that auth and payment PRs need a second pair of eyes, the agent stops being trusted to ship them solo, and the prompts get tighter as a result.

Cross-cutting check 12: prompt and diff reviewed together

This twelfth check sits outside the four bands because it applies to every band at once. The reviewer reads the prompt that produced the PR, not just the diff. Without the prompt, "out of scope" is not a sentence the reviewer can write. PR descriptions on AI-assisted PRs should include the prompt verbatim (or a link to the session), the model and version, and any rule files in play. If the team is not in that habit yet, this check is the moment to ask for it.

The PR-template snippet you can paste today

Pull-request templates live at .github/PULL_REQUEST_TEMPLATE.md on GitHub and at equivalent paths on GitLab and Bitbucket. The snippet below embeds the 12-item checklist as a task list and adds the prompt-disclosure section. The policy: any PR with an ai-assisted label must have every box ticked or a one-line "N/A because..." note next to the unticked box.

.github/PULL_REQUEST_TEMPLATE.md

## What this PR does

<!-- One paragraph. Link the issue. -->

## AI assistance

- [ ] This PR was written or substantially edited by an AI agent (add `ai-assisted` label)
- Tool and model:
- Prompt (verbatim or link to session):
- Rule file in effect (CLAUDE.md, .cursor/rules/*, etc.):

## Reviewer checklist

### Behaviour
- [ ] Implements the prompt and nothing else
- [ ] Every imported library function exists in the installed version
- [ ] Empty, error, loading, and boundary cases handled
- [ ] Side effects live in effects, watchers, or event handlers

### Dependencies
- [ ] New packages justified; no existing helper does the job
- [ ] Lockfile diff contains only the targeted change
- [ ] Bundle size delta within the team's threshold

### Types
- [ ] Zero new `any`, `unknown`, or `as` casts that silence a real error
- [ ] Client and server types agree on every touched shape

### Surface area
- [ ] No renamed or removed public exports outside the prompt
- [ ] No dead branches, `if (false)` guards, or unused imports
- [ ] Auth, payment, or file-upload changes have a second reviewer

The Vue or Nuxt equivalent path is identical, since GitHub reads the same .github/PULL_REQUEST_TEMPLATE.md regardless of the project's framework. If the team uses GitLab, the path is .gitlab/merge_request_templates/Default.md. Adopt the snippet on a Friday, audit how many boxes are ticked in the following week's PRs, and iterate.

How to retire items from the checklist as the team matures

The checklist is not forever. Each item should graduate to either an automated check, a cultural default, or a deprecated check the team no longer needs. A quarterly review of which boxes are always ticked without thought (good candidate for retirement or automation) and which are always unticked (good candidate for a tighter conversation) keeps the document honest.

Three retirements worth planning for first. Lockfile drift is automatable today: a CI step that runs pnpm install --frozen-lockfile on the PR branch fails if the lockfile contains untracked changes. Once that is green, check 5 leaves the checklist. Bundle size delta is automatable with size-limit, bundlewatch, or the framework's built-in budgets, and the PR comment becomes a CI comment. Check 7 leaves once it stops generating findings. The any and as cast count is enforceable with eslint-plugin-total-functions or a project-specific lint rule, and once tsc --strict --noImplicitAny is on, the manual grep stops earning its keep. Check 8 retires.

What does not automate is the behaviour band and the surface-area band. "Does it implement the prompt" and "did the agent rename an export outside the prompt" both require a human reading the prompt and the diff together. Those checks stay manual indefinitely, which is why they sit at the top and bottom of the list, and why a written rule file for the agent is the highest-impact place to invest after the checklist is in place. The fewer surprises the agent generates, the cheaper the human review.

The next phase of this work is turning the checklist into a team workflow with rituals and metrics. My vibe-code-to-production book builds on this article, adding the onboarding material that gets a new reviewer applying the 12 items consistently, the metrics that show whether the AI-assisted PRs are landing cleaner over time, and the prompt patterns that prevent the most common failures from reaching the diff at all.

Conclusion

Treat the next AI-assisted PR your team merges as the first data point in a longer experiment, not the moment the policy is finished. A written checklist makes the new failure modes visible, gives a mid-level reviewer the confidence to push back, and turns the conversation about AI-assisted code from "is the team comfortable with this" into "did the diff pass the gate". Adopt the snippet, run it for a quarter, and revisit which boxes still earn their keep.