Best Practices April 7, 2026 13 min read

Best AI Security Auditing Skills for Developers [2026]

Top 5 security auditing skills from our scored review — 146 vulnerability vectors, 11 footgun databases, and a real-time GitHub supply chain auditor.

#security #best-of #claude-code #cursor #windsurf

We installed 18 security auditing skills and read every file inside them. Most fell into two camps: either a short SKILL.md telling the AI to “check for vulnerabilities” with no specifics, or a full application bundled as a skill (complete with Dockerfiles, test suites, and API clients) that happens to address security but isn’t practical as an AI coding skill. The five skills below sit in the productive middle. They contain structured vulnerability criteria, language-specific detection patterns, and workflows that produce actionable audit reports. We scored each on five dimensions and we’ll show you exactly what’s inside.

How We Scored

Each skill was scored across five dimensions, 0-10 each, for a maximum of 50 points:

Relevance - Does it address real security auditing needs (vulnerability detection, OWASP coverage, supply chain risk)?
Depth - How much content is in the skill files? Specific vulnerability patterns, detection rules, not generic advice.
Actionability - Can a developer run this and get actionable security findings with specific file locations and remediation steps?
Structure - Well-organized workflow with clear security coverage? Handles different threat models and codebase sizes?
Adoption - Install count as a proxy for real-world validation.

We scored by reading the actual installed skill files — not registry descriptions, not GitHub READMEs.

Quick Comparison

Skill	Score	Key Feature	Tools / Standards	Installs
@ghostsecurity/scan-code	44/50	146 vulnerability vectors + 5-step verification	OWASP, SAST, language-agnostic	8,102
@trailofbits/sharp-edges	43/50	18 files with footgun databases for 11 languages	Python, Go, Rust, C, Solidity, JS + 5 more	8,340
@trailofbits/differential-review	42/50	6-phase workflow with adversarial modeling	Git, GitHub PRs, language-agnostic	3,566
@trailofbits/supply-chain-risk-auditor	40/50	Real-time GitHub queries for dependency risk	GitHub CLI, npm, PyPI, crates.io	8,515
@trailofbits/semgrep	39/50	Parallel Semgrep orchestration with third-party rulesets	Semgrep, SARIF, CI/CD pipelines	8,210

1. @ghostsecurity/scan-code — 44/50

Score: 44/50 | Relevance: 10 · Depth: 9 · Actionability: 9 · Structure: 9 · Adoption: 7

This is the most comprehensive SAST skill in the registry. Twelve files totaling 164 KB, including four YAML criteria files that define 146 distinct vulnerability vectors across backend, frontend, mobile, and library codebases. The criteria files alone are worth the install.

The criteria/backend.yaml (45 KB) is the centerpiece. It covers 10 vulnerability categories — injection (command, SQL, NoSQL, XPath, XML, template, prompt), authorization (BOLA, BFLA, privilege escalation, mass assignment), authentication (8 vectors from broken-authn to missing-authn), cryptographic failures, data exposure, SSRF, API security, serialization, file handling, and business logic (race conditions, workflow bypass, password reset flaws). Each vector includes candidate file patterns, CWE mapping, three-tier severity definitions, and specific validation criteria that must all be true for a finding to be surfaced — that last part matters because it’s what separates this from skills that simply tell the AI “look for SQL injection.”

The workflow operates in five steps: setup (project analysis and cache), plan (scan strategy by project type), nominate (select candidate files per vector), analyze (deep inspection), and verify (false positive elimination). The verification step runs against each finding individually, which is uncommon — most security skills produce findings and leave triage to the developer.

The criteria/index.yaml maps all 146 vectors to four project types. For mobile alone there are 28 vectors covering insecure data storage (5), insecure communication (3), authentication (3), crypto (3), code tampering (4), reverse engineering (4), and platform-specific issues (4). The library criteria cover prototype pollution, zip-slip, ReDoS, and unsafe deserialization — categories that backend-focused scanners routinely miss.

The one limitation: this skill depends on a repo-context caching step that requires a separate Ghost Security skill (ghost-scan-context). You can work around this, but the onboarding friction is real.

skillsafe install @ghostsecurity/scan-code

2. @trailofbits/sharp-edges — 43/50

Score: 43/50 | Relevance: 9 · Depth: 10 · Actionability: 8 · Structure: 9 · Adoption: 7

Eighteen files, 156 KB total. This is not a vulnerability scanner — it is a systematic methodology for finding API designs where the easy path leads to insecurity. The concept is “pit of success” analysis: does the API make the secure choice the default, or does it require developers to read documentation and remember special rules?

The SKILL.md (292 lines) defines six sharp edge categories: algorithm/mode selection footguns (the JWT alg: none pattern), dangerous defaults (what happens when timeout=0?), primitive vs. semantic APIs (Libsodium bytes vs. Halite typed keys), configuration cliffs (one wrong boolean disables all security), silent failures (verification functions that return True on missing keys), and stringly-typed security (permissions as comma-separated strings).

The reference files are where the depth shows. There are 11 language-specific guides — C, Go, Rust, Swift, Java, Kotlin, C#, PHP, JavaScript, Python, and Ruby — each roughly 6-7 KB and covering the footguns specific to that language’s security-relevant APIs. references/crypto-apis.md (5 KB) covers cryptographic API misuse patterns. references/config-patterns.md (9.7 KB) documents unsafe constructor parameters, environment variable overrides, and YAML/TOML parsing pitfalls. references/case-studies.md (8 KB) walks through real-world sharp edge failures in OpenSSL, GMP, and production authentication systems.

The four-phase analysis workflow — surface identification, edge case probing, threat modeling, finding validation — is structured around three adversary personas: the Scoundrel (actively malicious), the Lazy Developer (copy-pastes examples), and the Confused Developer (misunderstands the API). That framing is more practical than a generic “consider the attacker” instruction because it forces the model to evaluate the same code through three different misuse lenses.

The severity classification maps directly to default behavior: Critical means the default or obvious usage is insecure. High means easy misconfiguration breaks security. This is the right framing for design review, where the question is not “is there a bug?” but “does this design invite bugs?“

skillsafe install @trailofbits/sharp-edges

3. @trailofbits/differential-review — 42/50

Score: 42/50 | Relevance: 9 · Depth: 8 · Actionability: 9 · Structure: 9 · Adoption: 7

Six files totaling 44 KB. This is a security-focused differential review skill — meaning it analyzes code changes (PRs, commits, diffs) rather than entire codebases. That scope constraint is a feature. Full codebase audits are expensive and often unnecessary; the security risk is usually concentrated in what changed.

The SKILL.md (220 lines) opens with the right insight: “Heartbleed was 2 lines.” The rationalizations-to-reject table explicitly forbids treating small PRs as low-risk, skipping git history analysis, or calling something “just a refactor.”

The workflow is six phases: triage (classify changes by risk level), code analysis (security-focused line-by-line review), test coverage (flag untested security-sensitive changes), blast radius (quantitative measurement of how many callers are affected), deep context (architectural analysis for high-risk changes), and adversarial modeling (attacker perspective with concrete exploit scenarios).

methodology.md (6.7 KB) provides the detailed phase-by-phase workflow. adversarial.md (5 KB) covers attacker modeling and exploitability rating — this is Phase 5, activated only for HIGH risk changes. patterns.md (7 KB) is a reference of common vulnerability patterns including security regressions, reentrancy, access control removal, and integer overflow. reporting.md (7 KB) defines the output format.

The codebase size adaptation is practical: SMALL codebases (<20 files) get full deep analysis, MEDIUM (20-200) gets focused priority-file analysis, LARGE (200+) gets surgical critical-path-only analysis. The red flags section is specific enough to be useful: “Removed code from ‘security’, ‘CVE’, or ‘fix’ commits” and “Access control modifiers removed (onlyOwner, internal to external)” trigger mandatory adversarial analysis even during quick triage.

The install count is lower at 3,566, which likely reflects the narrower use case (PRs only, not full audits). For teams that review security-sensitive pull requests regularly, this skill provides a framework that generic code review skills do not.

skillsafe install @trailofbits/differential-review

4. @trailofbits/supply-chain-risk-auditor — 40/50

Score: 40/50 | Relevance: 9 · Depth: 7 · Actionability: 9 · Structure: 8 · Adoption: 7

Three files, 16 KB. Smaller than the other skills on this list, but focused on a specific problem that most security skills ignore entirely: identifying which of your project’s dependencies are at heightened risk of exploitation or takeover.

The SKILL.md (67 lines) defines six risk criteria, each with justification. Single maintainer or anonymous maintainer (the left-pad scenario). Unmaintained projects with unresponsive maintainers. Low popularity relative to other dependencies. High-risk features (FFI, deserialization, third-party code execution). Past CVE history. Absence of a security contact in SECURITY.md or CONTRIBUTING.md.

What makes this skill actionable rather than advisory is the workflow: it uses the gh CLI to query GitHub in real time for each dependency. It counts stars, open issues, recent commit activity, and maintainer profiles — and explicitly instructs the model to use accurate numbers with ~ rounding notation rather than guessing. The post-audit step fills in a suggested alternative for each high-risk dependency, preferring direct successors and drop-in replacements.

The results-template.md defines the output structure: an executive summary, a high-risk dependencies table with columns for dependency name, risk factors, and suggested alternatives, a counts-by-risk-factor summary, and recommendations. The template is rigid by design — no additional sections — which keeps the output predictable and diff-friendly for teams that run this periodically.

The limitation is scope: this skill evaluates dependency risk, not dependency vulnerabilities. It will flag that a single-maintainer package with no security contact is risky; it won’t find that the package has an unpatched deserialization flaw. It complements tools like npm audit and pip-audit rather than replacing them.

The clean scan score (100/100 on SkillSafe’s security scanner) and the practical gh-based methodology make this the strongest supply chain skill in the registry.

skillsafe install @trailofbits/supply-chain-risk-auditor

5. @trailofbits/semgrep — 39/50

Score: 39/50 | Relevance: 8 · Depth: 8 · Actionability: 8 · Structure: 8 · Adoption: 7

Seven files totaling 60 KB. This skill does not perform its own security analysis — it orchestrates Semgrep, the open-source static analysis engine, through a structured multi-agent workflow with parallel execution and third-party ruleset integration.

The SKILL.md (212 lines) is opinionated about process. Five essential principles: always disable telemetry (--metrics=off), require explicit user approval before scanning (the scan request is not approval), include third-party rulesets from Trail of Bits, 0xdea, and Decurity, spawn all scan tasks in parallel, and always check for Semgrep Pro before scanning. The rationalizations-to-reject table is unusually detailed: 13 shortcuts the model might take, each with an explanation of why it’s wrong.

The orchestration architecture spawns parallel scan tasks per language, each running with the approved rulesets. references/rulesets.md contains the complete ruleset catalog and selection algorithm. references/scan-modes.md defines two modes: “run all” (full coverage) and “important only” (pre-filtered by severity, post-filtered by confidence and impact). The “important only” mode applies two filter layers — a CLI flag for severity and a JSON metadata filter for category=security with confidence and impact both MEDIUM or higher. That two-layer approach is the difference between getting 500 findings and getting 30 findings that matter.

The scan uses references/scanner-task-prompt.md as a template for spawning each parallel agent, and scripts/merge_sarif.py merges the per-language SARIF outputs into a single results file. The merged SARIF output is compatible with GitHub Code Scanning, which means the results integrate directly into CI/CD workflows.

The skill requires Semgrep to be installed locally, and it checks for Semgrep Pro (which enables cross-file taint tracking, reportedly catching 250% more true positives). The hard gate at Step 3 — user must explicitly approve the scan plan before any scanning begins — is a good practice for a tool that can scan entire codebases.

skillsafe install @trailofbits/semgrep

Honorable Mentions

Three skills that scored well but didn’t make the top five:

@trailofbits/fp-check (8,198 installs) — A false positive verification skill with 8 files and 56 KB of content. It systematically verifies whether suspected security bugs are real by tracing data flow from source to sink, classifying by bug class, and running a devil’s advocate review. The standard vs. deep verification routing and the 13-item false positive patterns checklist make it a natural companion to any SAST skill. It narrowly missed the list because it doesn’t find vulnerabilities — it validates them.

@trailofbits/seatbelt-sandboxer (9,805 installs) — A macOS-specific skill that generates minimal Seatbelt sandbox profiles with deny-all defaults. The 313-line SKILL.md covers every Seatbelt operation category (file, network, process, Mach IPC, IOKit) with specific syntax and a full reference table. Excellent for sandboxing build tools and development servers, but the macOS-only scope limits its audience.

@seojoonkim/prompt-guard (8,751 installs) — The largest skill we evaluated at 748 KB and 50 files, with 600+ prompt injection detection patterns across 11 SHIELD categories. It covers supply chain skill injection, memory poisoning, action gate bypass, unicode steganography, and cascade amplification. The scope is impressive, but it’s a full Python application rather than an AI skill — most of the value is in the bundled Python library, not in the SKILL.md that guides the AI agent.

Frequently Asked Questions

What is the difference between a security skill and a security tool?

A security tool (Semgrep, Trivy, npm audit) is software you run independently. A security skill is a structured prompt that guides an AI coding agent — Claude Code, Cursor, or Windsurf — to perform security analysis using its native capabilities (reading files, running commands, analyzing patterns). Skills like @ghostsecurity/scan-code bridge both: they use the AI agent to execute a multi-step vulnerability analysis workflow with verification, while @trailofbits/semgrep orchestrates an external tool through the AI agent. Both approaches produce findings, but skills add contextual analysis that standalone tools cannot.

Do these skills replace dedicated security scanning in CI/CD?

No. These skills are best used during development — before code reaches the CI/CD pipeline. They help developers catch security issues during code review, dependency updates, and API design. A CI/CD pipeline should still run dedicated scanners (Semgrep, Snyk, Trivy) with pinned rule versions and deterministic output. The skills listed here are complementary: use @trailofbits/sharp-edges during API design, @trailofbits/differential-review during PR review, and @trailofbits/supply-chain-risk-auditor when evaluating new dependencies.

Which skill should I install first?

If you work on a single codebase and want the broadest coverage, start with @ghostsecurity/scan-code — its 146 vulnerability vectors cover backend, frontend, mobile, and library code. If you review security-sensitive pull requests, @trailofbits/differential-review gives you the most structured workflow. If your concern is dependency risk, @trailofbits/supply-chain-risk-auditor is the only skill in the registry that queries GitHub metadata in real time to assess maintainer and popularity risk.

Conclusion

The five skills above represent different layers of the security auditing workflow: vulnerability scanning (@ghostsecurity/scan-code, @trailofbits/semgrep), design-level analysis (@trailofbits/sharp-edges), change-level review (@trailofbits/differential-review), and supply chain assessment (@trailofbits/supply-chain-risk-auditor). They work well together because they don’t overlap — each addresses a different question at a different point in the development cycle.

Trail of Bits dominates this list with four of the five picks, which reflects their investment in structured security methodology. Ghost Security’s scan-code skill earns the top spot on sheer coverage — 146 vulnerability vectors with validation criteria is a depth that no other skill matches.

All five skills are free to install, work across Claude Code, Cursor, and Windsurf, and are verified on the SkillSafe registry.

Related roundups: Browse all Best Of roundups