AI code review has moved from experiment to production tooling. Claude Code Action and Gemini Code Assist can review every pull request automatically, catching logic bugs, security issues, and missing error handling that linters and static analysis miss entirely.

This tutorial sets up both models on the same repository. Claude handles logic and security review. Gemini handles style and documentation. A quality gate parses severity from both reviews and blocks merges when critical issues are found.

AIOps on a VPS: AI-Driven Server Management with Open-Source Tools

What does AI code review catch that linters miss?

Linters check syntax and formatting. Static analysis tools like Semgrep match known patterns. AI code reviewers do something different: they read the diff in context and reason about what the code does. They flag race conditions, missing error handling paths, insecure defaults, and business logic errors that pattern-matching tools cannot detect.

A Milvus study tested five AI models on real bug detection in production PRs. The best individual model caught 53% of bugs. When multiple models reviewed the same PR and debated findings across rounds, detection jumped to 80%. The hardest bugs, ones requiring system-level understanding of call chains and error propagation, hit 100% detection in multi-model debate mode.

That study is why this tutorial uses two models instead of one.

Concrete examples of what AI review catches:

Unvalidated user input flowing through three function calls before reaching a database query
Missing null checks where a function returns Optional<T> but the caller assumes it always succeeds
Hardcoded secrets in configuration files that look like placeholder values but are real credentials
Error handling gaps where a try/catch catches a generic exception and silently swallows it
Race conditions in concurrent code where shared state is modified without synchronization

How do you set up Claude Code Action for pull request review?

Claude Code Action is a GitHub Action maintained by Anthropic that runs Claude on your GitHub runner. It reads PR diffs, posts inline comments, and can also implement fixes. Setup takes about five minutes.

Install the Claude GitHub App

Go to github.com/apps/claude and install it on your repository or organization. Grant it access to the repositories you want reviewed. The app needs these permissions:

Contents: Read
Pull Requests: Read & Write
Issues: Read & Write (optional, for issue triage)

Add the API key

In your repository, go to Settings > Secrets and variables > Actions and create a new repository secret:

Name: ANTHROPIC_API_KEY
Value: Your Anthropic API key (starts with sk-ant-)

Never commit API keys to your repository. If you need organization-wide access, use environment secrets instead of repository secrets.

Create the workflow

Create .github/workflows/claude-review.yml:

name: Claude Code Review

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - "src/**"
      - "lib/**"
      - "app/**"
      - "config/**"
    paths-ignore:
      - "**/*.md"
      - "**/*.txt"
      - "docs/**"

concurrency:
  group: claude-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    permissions:
      contents: read
      pull-requests: write
      id-token: write

    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 1

      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: |
            REPO: ${{ github.repository }}
            PR NUMBER: ${{ github.event.pull_request.number }}

            Review this pull request. Focus on:
            1. Security: injection flaws, auth bypass, hardcoded secrets, insecure defaults
            2. Logic errors: off-by-one, null dereference, race conditions, resource leaks
            3. Error handling: swallowed exceptions, missing retries, unclear error messages
            4. Performance: N+1 queries, unbounded loops, unnecessary allocations

            Skip cosmetic issues (formatting, naming style). Another reviewer handles those.

            Rate each finding as CRITICAL, HIGH, MEDIUM, or LOW.

            Use inline comments for specific code issues.
            Use a summary PR comment for overall assessment.

          claude_args: |
            --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*)"

The paths filter limits reviews to source code changes. Documentation-only PRs skip the workflow entirely, which saves API costs. The concurrency block cancels in-progress reviews when a new commit is pushed to the same PR, so you do not pay for reviewing outdated code.

The --allowedTools restriction is a security measure. It limits Claude to reading diffs and posting comments. It cannot modify files, run arbitrary commands, or access other repositories.

How do you add Gemini code review to the same repository?

You have two options for Gemini-powered review: the Gemini Code Assist GitHub App (managed by Google, zero YAML) or the run-gemini-cli GitHub Action (self-managed, full control). This tutorial uses the GitHub Action because it gives you control over the prompt and integrates with the quality gate workflow.

Get a Gemini API key

Create a key at Google AI Studio. Add it as a repository secret:

Name: GEMINI_API_KEY
Value: Your Gemini API key

For enterprise use with Vertex AI authentication, see the run-gemini-cli documentation for Workload Identity Federation setup.

Create the Gemini workflow

Create .github/workflows/gemini-review.yml:

name: Gemini Code Review

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - "src/**"
      - "lib/**"
      - "app/**"
      - "config/**"
    paths-ignore:
      - "**/*.md"
      - "**/*.txt"
      - "docs/**"

concurrency:
  group: gemini-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    permissions:
      contents: read
      pull-requests: write

    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 1

      - uses: google-github-actions/run-gemini-cli@main
        with:
          gemini_api_key: ${{ secrets.GEMINI_API_KEY }}
          prompt: |
            Review this pull request. Focus on:
            1. Code style: naming conventions, function length, complexity
            2. Documentation: missing docstrings, outdated comments, unclear variable names
            3. Test coverage: untested edge cases, missing assertions, test quality
            4. Maintainability: code duplication, tight coupling, violation of SOLID principles

            Skip security and logic analysis. Another reviewer handles those.

            Rate each finding as CRITICAL, HIGH, MEDIUM, or LOW.

            Post your review as a single PR comment with structured sections.

The run-gemini-cli action is currently in beta, which is why the workflow pins to @main. For production use, pin to a specific commit SHA (e.g., @abc1234) to avoid unexpected changes when the main branch updates.

Why two separate workflows?

Running Claude and Gemini in separate workflow files means they execute in parallel. A failure in one does not block the other. You can also disable one model temporarily without touching the other workflow.

The role separation is intentional. Claude tends to walk call chains top-to-bottom and is good at catching error handling gaps and security flaws. Gemini is strong on code style, documentation completeness, and structural patterns. Overlap is fine. Agreement between models increases confidence in findings.

How does multi-model review work with Claude and Gemini together?

The two-workflow setup above already implements multi-model review. Both models review the same PR independently and post separate comments. This is the simplest pattern and works well for most teams.

For teams that want a unified view, add an aggregation step. This third workflow runs after both reviews complete and posts a combined summary:

name: AI Review Summary

on:
  workflow_run:
    workflows: ["Claude Code Review", "Gemini Code Review"]
    types: [completed]

jobs:
  aggregate:
    if: github.event.workflow_run.event == 'pull_request'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    permissions:
      pull-requests: write
      actions: read

    steps:
      - name: Collect review comments
        id: collect
        uses: actions/github-script@v7
        with:
          script: |
            const pr_number = context.payload.workflow_run.pull_requests[0]?.number;
            if (!pr_number) return;

            const comments = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: pr_number,
            });

            const aiComments = comments.data.filter(c =>
              c.body.includes('CRITICAL') ||
              c.body.includes('HIGH') ||
              c.body.includes('MEDIUM')
            );

            const critical = aiComments.filter(c => c.body.includes('CRITICAL')).length;
            const high = aiComments.filter(c => c.body.includes('HIGH')).length;

            const summary = `## AI Review Summary\n\n` +
              `| Severity | Count |\n|----------|-------|\n` +
              `| CRITICAL | ${critical} |\n` +
              `| HIGH | ${high} |\n\n` +
              (critical > 0 ? '⛔ **Merge blocked:** Critical findings require human review.\n' :
               high > 0 ? '⚠️ High-severity findings detected. Review recommended before merge.\n' :
               '✅ No critical or high-severity findings.\n');

            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: pr_number,
              body: summary,
            });

            core.setOutput('critical_count', critical);

      - name: Set status check
        if: steps.collect.outputs.critical_count > 0
        run: exit 1

This is a starting point. Adapt the comment parsing logic to match the exact format your prompts produce. The severity keywords (CRITICAL, HIGH, MEDIUM) in the review prompts make parsing straightforward.

How do you block merges when AI review finds critical issues?

The aggregation workflow above sets exit code 1 when critical findings exist. To make this a merge blocker, configure branch protection:

Go to Settings > Branches > Branch protection rules
Select or create a rule for your main branch
Enable Require status checks to pass before merging
Search for and add "AI Review Summary" as a required status check

Severity	Action	Merge blocked?
CRITICAL	Fail status check	Yes
HIGH	Warning in summary	No (advisory)
MEDIUM	Listed in summary	No
LOW	Omitted from summary	No

A word of caution: AI reviewers produce false positives. If you make the status check required, developers will occasionally need to dismiss findings that are wrong. Keep CRITICAL as the only blocking severity. If you block on HIGH as well, expect friction.

To override a blocked merge, a maintainer can use the admin bypass on the branch protection rule, or you can add a /dismiss-ai-review comment handler that re-runs the summary workflow with a force-pass flag.

How do you reduce false positives in AI code review?

False positives are the primary complaint from teams using AI code review. Noisy reviews erode trust quickly. Three techniques help.

Scope the prompt tightly

The prompts above already do this: Claude reviews security and logic, Gemini reviews style and docs. A model asked to "review everything" produces more noise than one given a specific mandate.

Add project-specific context to reduce false positives further. Create a CLAUDE.md file in your repository root (Claude Code Action reads this automatically):

# Project Context

This is a Django REST API. Python 3.12. PostgreSQL.

## Review Guidelines
- We use `rest_framework.exceptions` for all error handling. Do not flag bare `except` blocks in middleware.
- `settings.py` contains environment variable references, not hardcoded secrets. Do not flag `os.environ.get()` calls.
- We intentionally use `Any` type hints in the serializer layer. Do not flag these.
- Test files use `pytest` fixtures. Do not flag unused function parameters in test files.

For Gemini, create a GEMINI.md file in the repository root with equivalent project context.

Filter files

The paths and paths-ignore filters in the workflow YAML prevent reviews of files that generate noise:

paths-ignore:
  - "**/*.md"
  - "**/*.txt"
  - "**/*.lock"
  - "**/*.generated.*"
  - "migrations/**"
  - "vendor/**"
  - "dist/**"
  - "__snapshots__/**"

Lock files, generated code, vendored dependencies, and migration files produce false positives because they are machine-generated or follow patterns the model does not understand in context.

Set a severity threshold

If you only surface CRITICAL and HIGH findings in the summary, MEDIUM and LOW noise never reaches the developer. This is a better default than showing everything and asking developers to ignore the noise.

What are the security risks of AI code review on pull requests?

AI code reviewers process untrusted input. The PR diff, commit messages, issue titles, and file contents are all attacker-controlled in open-source repositories or when accepting external contributions. This creates prompt injection risk.

The Clinejection attack

In February 2026, an attacker exploited the Cline issue triage bot through indirect prompt injection. The attack was straightforward: a malicious GitHub issue title contained instructions disguised as an error message. The AI agent's workflow interpolated the issue title directly into the prompt. The agent followed the injected instructions, ran npm install on a malicious package, and exfiltrated the ANTHROPIC_API_KEY from the CI environment.

The attack compromised roughly 4,000 developer machines before the malicious package was pulled.

Hardening your workflows

Restrict tool permissions. The --allowedTools flag in the Claude workflow above limits what Claude can do. It can read diffs and post comments. It cannot run arbitrary shell commands, write files, or access secrets. This is the single most effective mitigation.

claude_args: |
  --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*)"

Without this restriction, a crafted PR diff could instruct Claude to run commands on the runner.

Never interpolate untrusted input into prompts. Do not use ${{ github.event.issue.title }} or ${{ github.event.pull_request.body }} inside the prompt field. Pass repository and PR number, and let the action fetch content through the GitHub API instead.

Handle fork PRs carefully. Fork PRs run with reduced GITHUB_TOKEN permissions by default, but your secrets are still available to the workflow if triggered by pull_request_target. Use pull_request (not pull_request_target) for AI review workflows. This means fork PRs cannot access your API keys and the review will not run, which is the safe default.

on:
  pull_request:        # Safe: fork PRs cannot access secrets
    types: [opened, synchronize]
  # pull_request_target:  # Dangerous: fork PRs CAN access secrets

Rotate API keys. If a key is exposed through a CI log or compromised runner, the blast radius is limited to the time between rotation. Rotate quarterly at minimum.

Audit workflow runs. Check the Actions tab periodically for unusual run times or unexpected tool invocations. A review that takes 45 minutes instead of the usual 3 minutes may indicate the model is being manipulated.

How much does AI code review cost per pull request?

Cost scales with PR size. Both Claude and Gemini charge per token processed. The input tokens include the PR diff, file context, and your prompt. Output tokens are the review comments.

PR Size	Typical diff (lines)	Estimated input tokens
Small	50-100	2,000-5,000
Medium	200-500	8,000-20,000
Large	500-1,500	20,000-60,000

Token counts vary by language. A 200-line Python diff uses fewer tokens than a 200-line Java diff because Java is more verbose.

Running two models doubles the token cost but not the dollar cost, because pricing differs between providers. Check the current per-token rates on the Anthropic pricing page and Google AI pricing page. Both use per-token pricing with different rates for input and output tokens.

To estimate your monthly budget: multiply your average PR size (in tokens) by the number of PRs per month, then multiply by the per-token rate for each model. A team merging 50 PRs per week with medium-sized diffs can calculate:

weekly_cost = 50 * avg_tokens_per_pr * (claude_input_rate + gemini_input_rate)
              + 50 * avg_output_tokens * (claude_output_rate + gemini_output_rate)

Reducing costs

Path filters prevent reviews of docs, lock files, and generated code. This is the biggest cost saver.
Concurrency cancellation stops reviewing outdated commits when a new push arrives.
Skip draft PRs by not including ready_for_review in your trigger types, then adding it only when you want reviews on draft-to-ready transitions.
Use smaller models for style review. Gemini Flash is cheaper than Gemini Pro for style and documentation checks where deep reasoning is unnecessary.

Comparison: AI code review tools

Feature	Claude Code Action	Gemini Code Assist	CodeRabbit	GitHub Copilot
Setup method	GitHub Action (YAML)	GitHub App or Action	GitHub App	Built-in
Pricing model	Per-token (API)	Per-token or free tier	Per-repository subscription	Per-seat subscription
Inline comments	Yes	Yes	Yes	Yes
Custom prompts	Full control	Full control (Action)	Config file	Limited
Self-hosted runner	Yes	Yes (Action)	No	No
Multi-model support	Combine with others	Combine with others	Single model	Single model
Open source	Yes (MIT)	Yes (Action)	No	No

Claude Code Action and the Gemini CLI Action are open source and run on your own runners. Your code never leaves your infrastructure except for the API call to the model provider. CodeRabbit and Copilot are managed services where code is processed on their infrastructure.

What are the limitations of AI code review?

AI code review is not a replacement for human review. It is a first pass that catches common issues and frees human reviewers to focus on architecture, design, and business logic decisions.

Context window limits. Large PRs (1,500+ lines changed) may exceed the model's context window or produce shallow reviews because the model cannot hold the entire diff in context. Split large PRs into smaller ones. This is good practice regardless of AI review.

No runtime understanding. AI reviewers see static code. They cannot detect issues that only manifest at runtime: memory leaks under load, timing-dependent race conditions, or performance degradation at scale.

False positives are unavoidable. Even with tight prompts and project context files, models will flag code that is correct. Budget 10-20% of review findings as false positives. If the rate is higher, tighten your prompts and add more context to CLAUDE.md or GEMINI.md.

No institutional knowledge. The model does not know your team's unwritten conventions, historical decisions, or domain-specific patterns unless you document them in the context files. Invest time in writing good CLAUDE.md and GEMINI.md files. This pays off across every future review.

Determinism. The same PR reviewed twice may produce different findings. AI review is probabilistic. Do not treat "no findings" as "no bugs."

Troubleshooting

Claude review workflow never triggers. Check the paths filter. If your source code lives outside src/, lib/, or app/, adjust the paths to match your project structure. Also verify the Claude GitHub App is installed on the repository.

Gemini returns empty reviews. Confirm the GEMINI_API_KEY secret is set and the key is valid. Test it locally:

curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY" | head -20

If you see a list of models, the key works.

Reviews take too long. The timeout-minutes: 15 setting kills the workflow if it exceeds 15 minutes. Large PRs with 1,000+ lines can take 5-10 minutes. If timeouts are frequent, tighten the paths filter to reduce the diff size.

Too many false positives. Add project context to CLAUDE.md and GEMINI.md. Be specific about patterns the model should ignore. "Do not flag X" is more effective than "be lenient."

Status check stays pending. The aggregation workflow triggers on workflow_run completion. If one of the two review workflows is skipped (because no matching files changed), the aggregation may not trigger. Add a paths filter to the aggregation workflow that matches the union of both review workflows, or use a separate always-run status check.