Security
How SkillSafe protects the supply chain for AI agent skills — from save to install.
Why This Matters
AI skill registries are under active attack. In January 2026, security researchers discovered 341 malicious skills distributing Atomic Stealer malware, exfiltrating credentials, and installing reverse shells on developer machines. A single trojan skill was downloaded over 7,700 times before removal. The affected registry had no pre-sharing scanning and relied solely on community flagging after damage was done. SkillSafe was designed to prevent these attacks before skills ever reach consumers.
In February 2026, researchers published SkillJect — an automated attack framework that achieves a 97.5% success rate against Claude Code using poisoned skills. The attack splits malicious behavior across two components: an innocuous-looking inducement prompt in SKILL.md and a payload hidden in a bundled script. Neither component appears malicious in isolation; together they form a complete attack. The v0.1.4 scanner ships four new detection passes in direct response to this paper. Read the full analysis →
Skills are only half the attack surface. MCP tool descriptions — visible to your AI agent but hidden from you — are an active vector for credential theft, cross-tool exploitation, and prompt injection. Scan your MCP tools →
Security Model
SkillSafe is designed to protect against supply-chain attacks on AI agent skills. The security model has three pillars:
Content Integrity
The entire skill archive is hashed with SHA-256 to produce an immutable tree hash. Any change to any file produces a different archive, which produces a different hash.
Dual-Side Verification
Publishers scan and upload a report alongside the archive. Consumers independently re-scan after download. The server compares both and returns a verdict.
Automated Static Analysis
Every file in every shared skill is analyzed using AST parsing and pattern matching. Python, JavaScript, TypeScript, and Markdown are scanned for dangerous functions, credential theft, and prompt injection before the skill reaches any consumer.
How Scans Work
Every skill shared through SkillSafe undergoes automated static analysis before it reaches consumers. Scans run on both the publisher side (at upload time) and the consumer side (at install time) to ensure independent verification.
What Gets Analyzed
- Python files (.py) — AST-based analysis for dangerous calls and imports
- JavaScript and TypeScript (.js, .ts, .jsx, .tsx, .mjs) — regex-based unsafe pattern detection
- All text files — secrets, hardcoded credentials, shell threat patterns, and Unicode obfuscation
- Markdown, YAML, and plain text — prompt injection and ClickFix social engineering
- Binary files (.exe, .dll, .so, .dylib) — flagged as unsafe bundled executables
- Base64 blobs — decoded and re-scanned for hidden payloads
Detection Techniques
- AST parsing (Python) — catches eval, exec, subprocess, and os.system even when obfuscated by aliases
- Regex scanning (JS/TS) — comment-aware, skips block and inline comments to reduce false positives
- Pattern matching (all text) — 81 rules across 23 threat categories aligned to the OWASP Agentic AI taxonomy
- Context-aware classification — findings in documentation code fences are classified as advisories (0 score impact) unless instructional intent is detected nearby, reducing false positives while preserving detection of OpenClaw-style attacks
- Base64 deep-scan — decodes suspicious blobs and re-applies all rules to decoded content
- Unicode analysis — detects zero-width characters and Cyrillic/Latin homograph mixing
- Inducement language detection — 6 patterns targeting SkillJect-style social engineering that steers agents toward executing bundled scripts
- Structural mimicry — multi-line context scan detecting setup section headers followed by bundled script execution references
- Composite co-occurrence — escalates severity when individually low-risk capabilities co-occur in a single file (exec + network, env + network)
- Surplus functionality — cross-file consistency check flagging script capabilities not documented in
SKILL.md
Severity & Scoring
- Critical — Active exploitation (reverse shells, credential theft, metadata endpoint access)
- High — Dangerous operations that bypass user consent (persistence, exfiltration)
- Medium — Suspicious patterns that may indicate malicious intent
- Low — Best-practice violations
- Each scan produces a 0–100 safety score and an A+ through F letter grade based on severity-weighted penalties
Cryptographic Verification Flow
SkillSafe uses a dual-side verification model to ensure that what the publisher uploaded is exactly what the consumer installs — with no possibility of tampering in transit or at rest.
Publisher Packages & Hashes
When a skill is saved, the system creates a deterministic file manifest (sorted entries with SHA-256 per file), then computes a single hash of the manifest — the tree hash.
Publisher-Side Scan
A full static analysis scan runs on the skill files — 12 passes covering dangerous calls, secrets, prompt injection, shell threats, and more. The scan report is stored alongside the skill version.
Server Stores Immutably
The server stores the archive per version with the tree hash recorded as metadata for integrity verification. The archive, publisher scan report, and metadata are all immutable once written. No overwrites — only new versions.
Consumer Downloads & Re-scans
On install, the consumer downloads the files, recomputes the tree hash, and the server runs the same static analysis scan independently. This produces a consumer-side scan report.
Verdict Comparison
The consumer's scan report is sent to the server, which compares it against the publisher's report. If the tree hashes match and both scan reports agree, the skill is marked ✓ verified. Any mismatch — tampered files, divergent findings, or hash discrepancies — results in a ✗ failed verdict with details.
What Scan Reports Contain
Every scan report is a structured document that captures the full security posture of a skill version. Reports are stored immutably and can be inspected by both publishers and consumers.
Scanner version, timestamp, tree hash, file count, total size, and the skill identifier (@scope/name@version).
The skill tree hash (SHA-256 of the full archive blob), scanner version, and ruleset version. The tree hash is the source of truth for content integrity — any deviation between publisher and consumer hashes triggers a verification failure.
Each finding includes: rule ID, severity level, classification (threat or advisory), affected file and line range, a human-readable description, and a code snippet showing the matched pattern. Findings are deterministic — same input always produces the same output. Advisory findings appear in documentation contexts (e.g. code fences in reference files) and carry 0 score penalty.
Threat count, advisory count, clean/not-clean flag (based on threats only), 0–100 safety score, and A+–F letter grade. The verification verdict (verified, divergent, or critical) is produced when the server compares the publisher and consumer reports.
Ruleset Changelog
The scanner ruleset is versioned independently of the CLI. Both publisher and consumer reports embed their ruleset version, and the server flags differences so you can tell when a divergence is caused by a ruleset upgrade rather than actual tampering. The entire CLI, including all scanner rules and verification logic, is open source.
Context-aware finding classification to reduce false positives while preserving detection of real attacks like the OpenClaw campaign. Scan report schema updated to v1.2. 81 rules across 23 threat categories. View full rule listing →
New features
- Threat vs advisory classification — Findings in documentation code fences (e.g.
references/*.md) are classified as advisory with 0 score penalty, unless instructional intent is detected nearby. Findings in executable files or near imperative language ("run this", "you must", "prerequisite") remain threats that affect the score. - Instructional intent detection — 17 patterns detecting imperative/social engineering language (e.g. "before using, run", "paste this in terminal",
curl | bashnear setup instructions). When found within 5 lines of a dangerous pattern, the finding stays classified as a threat even inside a code fence — directly targeting the OpenClaw attack vector. - Markdown code fence awareness — Scanner now tracks code fence boundaries in
.md/.txt/.rstfiles, distinguishing executable instructions from documentation examples. - Tightened SS13 regex —
dangerous_rm_rootnow only flags destructive wildcards (rm -rf /,rm -rf ~,rm -rf $HOME). Targeted removals likerm /tmp/specific-fileno longer trigger false positives. - Severity downgrade for doc paths — Advisory findings in
references/,docs/,examples/, andtests/directories receive a severity downgrade (critical→high, high→medium). - Schema v1.2 — Scan reports now include
advisory_countand aclassificationfield ("threat"or"advisory") on each finding.findings_countreflects threat count only.cleanis true when threat count is 0.
Four new detection passes targeting SkillJect-style skill-based prompt injection, developed in direct response to arXiv:2602.14211. Scanner expanded from 8 to 11 passes; 81 rules across 23 threat categories. View full rule listing → · Read the technical write-up →
New detection passes
| Pass | Category | Max Severity | What it detects |
|---|---|---|---|
| SS-SI | Inducement Language | high | 6 patterns for social engineering that steers agents toward script execution: "before using, run...", "for the tool to work", "this setup step is required", "run the bundled script", "automatically execute *.sh", "must be run first" |
| SS-SM | Structural Mimicry | high | Multi-line context scan: SS-SM01 fires when a Prerequisites/Setup/Quick Start header is followed within 10 lines by a bundled script reference; SS-SM02 fires when an urgency marker (> **IMPORTANT**, **CRITICAL**) appears within 3 lines of a script reference |
| SS-CP | Composite Co-occurrence | critical | Escalates severity when low-risk primitives co-occur: SS-CP01 (exec + network, critical), SS-CP02 (env vars + network, high), SS-CP03 (file write + network, high), SS-CP04 (3+ medium findings in one file, high) |
| SS-SF | Surplus Functionality | critical | Cross-file consistency check: flags script capabilities absent from SKILL.md — undocumented network calls (critical), env var reads (high), subprocess invocations (high), file writes (medium) |
Design notes
- Two structural gaps closed — The prior scanner lacked inducement language detection and cross-file consistency checking. SkillJect exploits exactly these gaps, hiding payloads where no single-file scan can find them.
- Cross-file consistency (SS-SF) — Pass 11 extracts a documentation intent profile from
SKILL.md, then checks each script for undocumented network calls, credential access, subprocess invocations, and file writes. Undocumented outbound network calls are the single most reliable signal of a SkillJect-style payload. - Composite escalation (SS-CP) — Targets the paper's finding that SkillJect composes attacks from primitives that individually fall below alert thresholds. Three or more medium-severity findings in a single file trigger a high-severity composite alert.
Added 15 new detection categories (SS03–SS21), A-F safety scoring, and ruleset-upgrade divergence detection. Total: 65 rules across 19 threat categories. View full rule listing →
New detection categories
| ID | Category | Max Severity | What it detects |
|---|---|---|---|
| SS03 | Data Exfiltration | high | curl/wget to ngrok, webhook.site, pipedream, RequestBin, Burp Collaborator |
| SS04 | Agent Memory Poisoning | high | Writes to MEMORY.md, CLAUDE.md, .cursorrules via redirection or echo |
| SS05 | Encoded Malware | critical | base64 decode-then-execute patterns; deep-scans decoded blobs for hidden payloads |
| SS07 | Privilege Escalation | critical | sudo su/bash/sh, setuid(0), seteuid(0) |
| SS08 | Persistence | high | cron, macOS LaunchAgents/LaunchDaemons, systemd services, shell profile writes |
| SS09 | Reverse Shell | critical | /dev/tcp, /dev/udp, netcat -e, socat EXEC, bash redirect |
| SS10 | Unicode Obfuscation | high | Zero-width characters (U+200B–U+2060, U+FEFF); Cyrillic/Latin homograph mixing |
| SS11 | ClickFix Social Engineering | high | "Open terminal and paste…", copy-paste command instructions, Windows Run dialog tricks |
| SS13 | Dangerous File Operations | critical | rm -rf /, rm -rf ~, dd of=/dev/sd* block-device writes |
| SS14 | Reconnaissance | critical | nmap, masscan, arp-scan, zmap; cloud IMDS endpoints (169.254.169.254, metadata.google.internal) |
| SS16 | Bundled Binaries | high | .exe, .dll, .so, .dylib, .elf, .bin and other executable/library extensions |
| SS17 | Credential File Access | critical | Reads to ~/.aws/credentials, ~/.docker/config.json; find/search over .ssh, .gnupg |
| SS18 | Cryptocurrency Targeting | critical | Seed/recovery phrases; MetaMask, Phantom, Exodus, Ledger wallet references; wallet directories |
| SS19/20 | Path Traversal | critical | ../../etc traversals; reads to /etc/passwd, /etc/shadow; git hook writes |
New features
- Safety score — Every scan produces a 0–100 numeric score and A+/A/B/C/D/F letter grade based on severity-weighted penalties (critical −25, high −15, medium −5, low −2).
- Ruleset-upgrade divergence — When a consumer verifies with a newer ruleset, the server sets
ruleset_upgrade_divergence: truein verification details so the UI can explain the mismatch rather than implying tampering. - CI mode (
--check) — Exits with code 1 if any HIGH or CRITICAL findings are present; suitable for pre-commit hooks and CI pipelines. - Rule suppression (
--ignore) — Comma-separated rule IDs can be suppressed per-scan to reduce noise for known-safe patterns in a codebase.
Four foundational detection categories covering code execution, credential theft, prompt injection, and hardcoded API keys. 34 rules total. View full rule listing →
Detection categories
| ID | Category | Max Severity | What it detects |
|---|---|---|---|
| SS01 | Code Execution | high | Python: eval, exec, subprocess.*, os.system (AST-based). JS/TS: eval, Function constructor, child_process, execSync (regex) |
| SS02 | Credential Theft | critical | AWS Access Key IDs, PEM private keys, GitHub tokens (gh[pousr]_…), Slack tokens |
| SS15 | Prompt Injection | high | "ignore previous instructions", role hijacking ("you are now…"), instruction override/forget patterns |
| SS21 | Hardcoded API Keys | high | Generic api_key=, secret_key=, access_token=, password= patterns with ≥16-char values |
Supply Chain Guarantees
SkillSafe is built to make several strong guarantees about every skill in the registry:
Advisories
No known security incidents have affected SkillSafe users to date.
Responsible Disclosure
If you discover a security vulnerability, please report it responsibly:
We acknowledge reports within 48 hours and aim to fix critical issues within 7 days. We do not pursue legal action against researchers acting in good faith.
What to include in your report:
- Description of the vulnerability and its potential impact
- Steps to reproduce or proof-of-concept
- Affected component (CLI, API, web, scanner)
- Your suggested severity assessment
Our commitment:
- Acknowledgment within 48 hours
- Initial triage and severity assessment within 5 business days
- Critical fixes deployed within 7 days
- Credit in advisory (unless you prefer anonymity)