Best Practices 11 min read

Best AI Code Review Tools and Skills [2026]

We installed and scored 23 code review skills. These 5 stood out — with real checklists, multi-agent workflows, and 1,500+ lines of review patterns.

We installed 23 code review skills and read every SKILL.md. Most were thin wrappers — a few paragraphs telling the AI to “be thorough” and “check for bugs.” Useful in the same way that “write good code” is useful. The five skills below are different. They contain actual checklists, multi-agent dispatch logic, language-specific pitfall lists, and structured severity frameworks. We scored each on five dimensions and we’ll show you exactly what’s inside.

How We Scored

Each skill was scored across five dimensions, 0–10 each, for a maximum of 50 points:

  • Relevance — Does it address real code review concerns or just restate the obvious?
  • Depth — How much actual content is in the skill files? Specific patterns, not vague directives.
  • Actionability — Can a developer follow the output and do something concrete with it?
  • Structure — Is the workflow well-organized? Does it handle edge cases?
  • Adoption — Install count as a proxy for real-world validation.

We scored by reading the actual skill files — not descriptions, not README summaries.

Quick Comparison

SkillScoreKey FeatureProgramming LanguagesInstalls
@sanyuan0704/code-review-expert42/507-step workflow + P0–P3 severityLanguage-agnostic8,926
@obra/receiving-code-review41/50How to respond to review feedbackLanguage-agnostic4,168
@wshobson/code-review-excellence41/50Timed 4-phase process + language pitfallsPython, TypeScript5,797
@thebushidocollective/code-review40/505 parallel agents + confidence scoringLanguage-agnostic7,134
@yyh211/local-diff-review40/50612-line quality standards (Chinese)TypeScript, Node.js7,900

1. @sanyuan0704/code-review-expert — 42/50

Score: 42/50 | Relevance: 9 · Depth: 9 · Actionability: 9 · Structure: 8 · Adoption: 7

This is the most complete code review skill in the registry. Six files, 595 lines, and a workflow that actually thinks about process rather than just checklist-dumping on every PR.

The 7-step workflow goes: preflight git context → SOLID analysis → removal candidates → security scan → quality scan → structured P0–P3 output → user confirmation. That sequencing matters. The preflight step pulls in the PR diff, commit history, and changed file context before any analysis starts — which means the AI isn’t reviewing code in a vacuum. The removal-candidates step is unusual and valuable: it explicitly looks for code that can be deleted, not just improved.

The reference files are where this skill earns its score. security-checklist.md is 118 lines covering XSS, injection, SSRF, JWT misuse, CORS misconfiguration, and race conditions — not as categories to be aware of, but as specific patterns to check. code-quality-checklist.md runs 130 lines covering complexity, naming, error propagation, and test coverage. solid-checklist.md (65 lines) treats SOLID principles as actionable questions rather than philosophical principles.

The P0–P3 severity framework in the output stage is worth highlighting. P0 is a blocking security issue. P3 is a style nit. The skill instructs the model to group findings by severity before presenting them, which means the developer sees what’s critical first and what’s optional last — the opposite of what unsorted review output usually looks like.

Edge case handling is explicit: empty diffs get a short confirmation output, diffs over 500 lines trigger a chunked review mode, and mixed-concern PRs (feature + refactor in the same commit) are flagged with a recommendation to split.

skillsafe install @sanyuan0704/code-review-expert

2. @obra/receiving-code-review — 41/50

Score: 41/50 | Relevance: 10 · Depth: 8 · Actionability: 9 · Structure: 8 · Adoption: 6

One file, 213 lines, and a completely different take on what “code review skill” means. This skill is not about how to give a review. It’s about how to respond to one — and it’s the most original skill in this roundup.

The core framework is READ → UNDERSTAND → VERIFY → EVALUATE → RESPOND → IMPLEMENT. The EVALUATE step is where most AI assistants fail: given a list of review comments, the default behavior is to implement all of them immediately. @obra/receiving-code-review explicitly forbids this. It instructs the model to assess each piece of feedback on technical merit before touching any code.

The skill has a specific rule worth quoting: it prohibits performative agreement — phrases like “You’re absolutely right!” or “Great catch!” before implementing suggestions. The reasoning is that performative compliance signals to the reviewer that their feedback was correct when it may not have been, which erodes the reviewer’s calibration over time and makes future reviews less useful.

There’s a decision tree for when to push back versus implement. If a suggestion improves correctness or removes a real bug, implement it. If it’s a style preference presented as a principle, note the disagreement and ask before changing. If suggestions conflict with each other, surface the conflict explicitly rather than picking one arbitrarily.

This skill comes from github.com/obra/superpowers (353 stars), which is a larger collection of review-related skills. The install count of 4,168 is lower than the others in this list, but the star count suggests the community that’s found it values it highly.

For developers who receive review from multiple stakeholders with varying levels of authority and correctness, this skill is genuinely useful in a way that generic review skills are not.

skillsafe install @obra/receiving-code-review

3. @wshobson/code-review-excellence — 41/50

Score: 41/50 | Relevance: 8 · Depth: 9 · Actionability: 9 · Structure: 8 · Adoption: 7

Two files, 555 lines: a 40-line SKILL.md and a 515-line implementation-playbook.md. The playbook is the real product. It treats code review as a timed, phased process — which sounds bureaucratic but is actually one of the most practical things you can encode in a review skill.

The four phases: Context (2–3 min), High-Level (5–10 min), Line-by-Line (10–20 min), Summary (2–3 min). The time allocations are notional when an AI is doing the work, but they encode something more important: what deserves the most attention. The High-Level and Line-by-Line phases together are allotted 15–30 minutes, meaning the skill deprioritizes both the quick skim and the endless bikeshedding in favor of actual analysis.

The language-specific sections are unusually concrete. For Python: check for mutable default arguments (def fn(items=[])), broad exception clauses (except Exception), and missing __all__ in public modules. For TypeScript: flag any type usage, unhandled async/await rejections, and missing null checks on optional chaining. These aren’t invented — they’re the bugs that actually show up in code review queues.

The skill also distinguishes between blocking issues and suggestions using explicit formatting conventions in the output, which makes the review easier to act on. Blocking issues are prefixed with [BLOCK]; suggestions with [SUGGEST]; nitpicks with [NIT].

From github.com/wshobson/agents (65 stars), with 5,797 installs — the highest install count of any fully-documented skill in this roundup.

skillsafe install @wshobson/code-review-excellence

4. @thebushidocollective/code-review — 40/50

Score: 40/50 | Relevance: 8 · Depth: 8 · Actionability: 8 · Structure: 9 · Adoption: 7

One file, 481 lines, and the most sophisticated multi-agent workflow of any skill in the registry. Where most review skills run a single pass over the diff, this one dispatches five parallel agents and aggregates their findings.

The five agents: (1) CLAUDE.md compliance checker — verifies the PR follows project-specific conventions defined in the repo’s CLAUDE.md file; (2) shallow bug scanner — fast pass for obvious logic errors and null dereferences; (3) git blame analyzer — checks whether the changed code has a pattern of recent churn, which is a proxy for instability; (4) past PR pattern detector — looks for issues that were raised in previous reviews on the same files; (5) code comment compliance checker — verifies inline comments meet the project’s documentation standards.

The deduplication and confidence scoring logic is where this skill shows its design. Each agent produces findings with a 0–100 confidence score. Findings below 80 are filtered before output. Duplicate findings across agents are merged with the highest confidence score, not listed separately. The net effect is a review output that’s shorter and less noisy than a naive union of all agent findings.

There’s also an explicit false positive list — patterns the skill has been trained to ignore because they reliably generate noise without catching real issues. And the skill explicitly skips draft PRs, closed PRs, and PRs over 1,000 lines (the latter get a summary-only pass with a note that line-by-line review is not feasible).

With 7,134 installs, this is one of the most-adopted skills in this roundup. The architecture is more complex than the others, which means it’s more likely to behave unexpectedly in edge cases — but the explicit handling of drafts, large diffs, and confidence thresholds shows that the author has been running this in production.

skillsafe install @thebushidocollective/code-review

5. @yyh211/local-diff-review — 40/50

Score: 40/50 | Relevance: 8 · Depth: 10 · Actionability: 8 · Structure: 7 · Adoption: 7

Three files, 1,541 lines. The most reference material of any code review skill we analyzed, and the most install count in this roundup at 7,900.

One important note: the primary content file is in Chinese. If you’re not reading Chinese, you’ll be relying on the AI’s translation of the review standards, which works but means you can’t audit what the skill actually says. Factor that into your decision.

The centerpiece is code-quality-standards.md at 612 lines, organized across six dimensions: Security, Correctness, Performance, Testability, Maintainability, and Documentation. Each dimension has its own section with specific patterns, not just category headings. The Security section covers injection, authentication, and cryptographic misuse. The Performance section covers algorithmic complexity, database query patterns, and caching strategy. Each section includes TypeScript and Node.js code examples showing bad patterns versus good replacements — not just descriptions of what to avoid.

The three-tier severity system (must-check, recommended, optional) maps cleanly onto how engineering teams actually triage review feedback. Must-check findings block the PR. Recommended findings are expected to be addressed before merge but can be deferred with a comment. Optional findings are style or preference items the author can ignore.

The skill also handles local diffs specifically — the name isn’t decorative. It’s designed for reviewing uncommitted changes via git diff, not just PR-based review. That makes it useful as a pre-commit pass before you even push.

At 1,541 lines, this is the deepest skill in the registry for code review. Whether that depth translates to better output depends on how well the AI internalizes the standards, but the raw material is there.

skillsafe install @yyh211/local-diff-review

Frequently Asked Questions

What makes a good AI code review skill?

Specificity. The worst skills tell the AI to “be thorough” or “check for common issues” — which produces the same sycophantic output you’d get with no skill at all. The best skills contain concrete checklists, explicit severity tiers, language-specific pitfall lists, and structured output formats. The skills in this roundup score well because they make decisions that the AI would otherwise have to improvise: what counts as a blocking issue, when to skip a review, how to handle a 1,000-line diff, what Python anti-patterns actually show up in real code.

Do these skills work with Claude Code, Cursor, and Windsurf?

Yes. All five skills use the SKILL.md format, which is supported by Claude Code, Cursor, Windsurf, and other compatible runtimes. Install once with skillsafe install, and the skill is available across tools. The multi-agent workflow in @thebushidocollective/code-review requires a tool that supports parallel agent dispatch — Claude Code handles this natively. For Cursor and Windsurf, the dispatching logic may run sequentially rather than in parallel, which produces the same output but takes longer.

How were these skills scored?

We read the actual skill files for each of the 23 skills we evaluated — not just the descriptions. Scores were assigned across five dimensions (Relevance, Depth, Actionability, Structure, Adoption) at 0–10 each for a max of 50 points. Depth was scored based on line counts, number of reference files, specificity of patterns, and presence of concrete examples. Adoption was based on install count in the SkillSafe registry. Skills that described what they did without providing the content to do it were penalized on Depth regardless of install count.

Conclusion

If you install one skill from this list, start with @sanyuan0704/code-review-expert — the 7-step workflow and P0–P3 severity framework alone are worth it. If your team also receives code review and you want the AI to help evaluate feedback rather than just implement it blindly, pair it with @obra/receiving-code-review.

For teams that need the deepest reference material and don’t mind Chinese content, @yyh211/local-diff-review has more raw coverage than anything else in the registry. For parallel agent workflows with automatic confidence filtering, @thebushidocollective/code-review is technically the most sophisticated option.

skillsafe install @sanyuan0704/code-review-expert
skillsafe install @obra/receiving-code-review

Related roundups: Browse all Best Of roundups