Why This Matters

AI skill registries are under active attack. In January 2026, security researchers discovered 341 malicious skills distributing Atomic Stealer malware, exfiltrating credentials, and installing reverse shells on developer machines. A single trojan skill was downloaded over 7,700 times before removal. The affected registry had no pre-sharing scanning and relied solely on community flagging after damage was done. SkillSafe was designed to prevent these attacks before skills ever reach consumers.

In February 2026, researchers published SkillJect — an automated attack framework that achieves a 97.5% success rate against Claude Code using poisoned skills. The attack splits malicious behavior across two components: an innocuous-looking inducement prompt in SKILL.md and a payload hidden in a bundled script. Neither component appears malicious in isolation; together they form a complete attack. The v0.1.4 scanner ships four new detection passes in direct response to this paper. Read the full analysis →

Skills are only half the attack surface. MCP tool descriptions — visible to your AI agent but hidden from you — are an active vector for credential theft, cross-tool exploitation, and prompt injection. Scan your MCP tools →

Security Model

SkillSafe is designed to protect against supply-chain attacks on AI agent skills. The security model has three pillars:

Content Integrity

The entire skill archive is hashed with SHA-256 to produce an immutable tree hash. Any change to any file produces a different archive, which produces a different hash.

Dual-Side Verification

Publishers scan and upload a report alongside the archive. Consumers independently re-scan after download. The server compares both and returns a verdict.

Automated Static Analysis

Every file in every shared skill is analyzed using AST parsing and pattern matching. Python, JavaScript, TypeScript, and Markdown are scanned for dangerous functions, credential theft, and prompt injection before the skill reaches any consumer.

How Scans Work

Every skill shared through SkillSafe undergoes automated static analysis before it reaches consumers. Scans run on both the publisher side (at upload time) and the consumer side (at install time) to ensure independent verification.

What Gets Analyzed

  • Python files (.py) — AST-based analysis for dangerous calls and imports
  • JavaScript and TypeScript (.js, .ts, .jsx, .tsx, .mjs) — regex-based unsafe pattern detection
  • All text files — secrets, hardcoded credentials, shell threat patterns, and Unicode obfuscation
  • Markdown, YAML, and plain text — prompt injection and ClickFix social engineering
  • Binary files (.exe, .dll, .so, .dylib) — flagged as unsafe bundled executables
  • Base64 blobs — decoded and re-scanned for hidden payloads

Detection Techniques

  • AST parsing (Python) — catches eval, exec, subprocess, and os.system even when obfuscated by aliases
  • Regex scanning (JS/TS) — comment-aware, skips block and inline comments to reduce false positives
  • Pattern matching (all text) — 81 rules across 23 threat categories aligned to the OWASP Agentic AI taxonomy
  • Context-aware classification — findings in documentation code fences are classified as advisories (0 score impact) unless instructional intent is detected nearby, reducing false positives while preserving detection of OpenClaw-style attacks
  • Base64 deep-scan — decodes suspicious blobs and re-applies all rules to decoded content
  • Unicode analysis — detects zero-width characters and Cyrillic/Latin homograph mixing
  • Inducement language detection — 6 patterns targeting SkillJect-style social engineering that steers agents toward executing bundled scripts
  • Structural mimicry — multi-line context scan detecting setup section headers followed by bundled script execution references
  • Composite co-occurrence — escalates severity when individually low-risk capabilities co-occur in a single file (exec + network, env + network)
  • Surplus functionality — cross-file consistency check flagging script capabilities not documented in SKILL.md

Severity & Scoring

  • Critical — Active exploitation (reverse shells, credential theft, metadata endpoint access)
  • High — Dangerous operations that bypass user consent (persistence, exfiltration)
  • Medium — Suspicious patterns that may indicate malicious intent
  • Low — Best-practice violations
  • Each scan produces a 0–100 safety score and an A+ through F letter grade based on severity-weighted penalties

Cryptographic Verification Flow

SkillSafe uses a dual-side verification model to ensure that what the publisher uploaded is exactly what the consumer installs — with no possibility of tampering in transit or at rest.

1

Publisher Packages & Hashes

When a skill is saved, the system creates a deterministic file manifest (sorted entries with SHA-256 per file), then computes a single hash of the manifest — the tree hash.

2

Publisher-Side Scan

A full static analysis scan runs on the skill files — 12 passes covering dangerous calls, secrets, prompt injection, shell threats, and more. The scan report is stored alongside the skill version.

3

Server Stores Immutably

The server stores the archive per version with the tree hash recorded as metadata for integrity verification. The archive, publisher scan report, and metadata are all immutable once written. No overwrites — only new versions.

4

Consumer Downloads & Re-scans

On install, the consumer downloads the files, recomputes the tree hash, and the server runs the same static analysis scan independently. This produces a consumer-side scan report.

5

Verdict Comparison

The consumer's scan report is sent to the server, which compares it against the publisher's report. If the tree hashes match and both scan reports agree, the skill is marked ✓ verified. Any mismatch — tampered files, divergent findings, or hash discrepancies — results in a ✗ failed verdict with details.

What Scan Reports Contain

Every scan report is a structured document that captures the full security posture of a skill version. Reports are stored immutably and can be inspected by both publishers and consumers.

Metadata

Scanner version, timestamp, tree hash, file count, total size, and the skill identifier (@scope/name@version).

Integrity

The skill tree hash (SHA-256 of the full archive blob), scanner version, and ruleset version. The tree hash is the source of truth for content integrity — any deviation between publisher and consumer hashes triggers a verification failure.

Findings

Each finding includes: rule ID, severity level, classification (threat or advisory), affected file and line range, a human-readable description, and a code snippet showing the matched pattern. Findings are deterministic — same input always produces the same output. Advisory findings appear in documentation contexts (e.g. code fences in reference files) and carry 0 score penalty.

Summary & Verdict

Threat count, advisory count, clean/not-clean flag (based on threats only), 0–100 safety score, and A+–F letter grade. The verification verdict (verified, divergent, or critical) is produced when the server compares the publisher and consumer reports.

Ruleset Changelog

The scanner ruleset is versioned independently of the CLI. Both publisher and consumer reports embed their ruleset version, and the server flags differences so you can tell when a divergence is caused by a ruleset upgrade rather than actual tampering. The entire CLI, including all scanner rules and verification logic, is open source.

2026.04.08 Current
April 2026

Context-aware finding classification to reduce false positives while preserving detection of real attacks like the OpenClaw campaign. Scan report schema updated to v1.2. 81 rules across 23 threat categories. View full rule listing →

New features
  • Threat vs advisory classification — Findings in documentation code fences (e.g. references/*.md) are classified as advisory with 0 score penalty, unless instructional intent is detected nearby. Findings in executable files or near imperative language ("run this", "you must", "prerequisite") remain threats that affect the score.
  • Instructional intent detection — 17 patterns detecting imperative/social engineering language (e.g. "before using, run", "paste this in terminal", curl | bash near setup instructions). When found within 5 lines of a dangerous pattern, the finding stays classified as a threat even inside a code fence — directly targeting the OpenClaw attack vector.
  • Markdown code fence awareness — Scanner now tracks code fence boundaries in .md/.txt/.rst files, distinguishing executable instructions from documentation examples.
  • Tightened SS13 regexdangerous_rm_root now only flags destructive wildcards (rm -rf /, rm -rf ~, rm -rf $HOME). Targeted removals like rm /tmp/specific-file no longer trigger false positives.
  • Severity downgrade for doc paths — Advisory findings in references/, docs/, examples/, and tests/ directories receive a severity downgrade (critical→high, high→medium).
  • Schema v1.2 — Scan reports now include advisory_count and a classification field ("threat" or "advisory") on each finding. findings_count reflects threat count only. clean is true when threat count is 0.
2026.03.15
March 2026

Four new detection passes targeting SkillJect-style skill-based prompt injection, developed in direct response to arXiv:2602.14211. Scanner expanded from 8 to 11 passes; 81 rules across 23 threat categories. View full rule listing → · Read the technical write-up →

New detection passes
PassCategoryMax SeverityWhat it detects
SS-SIInducement Languagehigh6 patterns for social engineering that steers agents toward script execution: "before using, run...", "for the tool to work", "this setup step is required", "run the bundled script", "automatically execute *.sh", "must be run first"
SS-SMStructural MimicryhighMulti-line context scan: SS-SM01 fires when a Prerequisites/Setup/Quick Start header is followed within 10 lines by a bundled script reference; SS-SM02 fires when an urgency marker (> **IMPORTANT**, **CRITICAL**) appears within 3 lines of a script reference
SS-CPComposite Co-occurrencecriticalEscalates severity when low-risk primitives co-occur: SS-CP01 (exec + network, critical), SS-CP02 (env vars + network, high), SS-CP03 (file write + network, high), SS-CP04 (3+ medium findings in one file, high)
SS-SFSurplus FunctionalitycriticalCross-file consistency check: flags script capabilities absent from SKILL.md — undocumented network calls (critical), env var reads (high), subprocess invocations (high), file writes (medium)
Design notes
  • Two structural gaps closed — The prior scanner lacked inducement language detection and cross-file consistency checking. SkillJect exploits exactly these gaps, hiding payloads where no single-file scan can find them.
  • Cross-file consistency (SS-SF) — Pass 11 extracts a documentation intent profile from SKILL.md, then checks each script for undocumented network calls, credential access, subprocess invocations, and file writes. Undocumented outbound network calls are the single most reliable signal of a SkillJect-style payload.
  • Composite escalation (SS-CP) — Targets the paper's finding that SkillJect composes attacks from primitives that individually fall below alert thresholds. Three or more medium-severity findings in a single file trigger a high-severity composite alert.
2026.03.01
March 2026

Added 15 new detection categories (SS03–SS21), A-F safety scoring, and ruleset-upgrade divergence detection. Total: 65 rules across 19 threat categories. View full rule listing →

New detection categories
IDCategoryMax SeverityWhat it detects
SS03Data Exfiltrationhighcurl/wget to ngrok, webhook.site, pipedream, RequestBin, Burp Collaborator
SS04Agent Memory PoisoninghighWrites to MEMORY.md, CLAUDE.md, .cursorrules via redirection or echo
SS05Encoded Malwarecriticalbase64 decode-then-execute patterns; deep-scans decoded blobs for hidden payloads
SS07Privilege Escalationcriticalsudo su/bash/sh, setuid(0), seteuid(0)
SS08Persistencehighcron, macOS LaunchAgents/LaunchDaemons, systemd services, shell profile writes
SS09Reverse Shellcritical/dev/tcp, /dev/udp, netcat -e, socat EXEC, bash redirect
SS10Unicode ObfuscationhighZero-width characters (U+200B–U+2060, U+FEFF); Cyrillic/Latin homograph mixing
SS11ClickFix Social Engineeringhigh"Open terminal and paste…", copy-paste command instructions, Windows Run dialog tricks
SS13Dangerous File Operationscriticalrm -rf /, rm -rf ~, dd of=/dev/sd* block-device writes
SS14Reconnaissancecriticalnmap, masscan, arp-scan, zmap; cloud IMDS endpoints (169.254.169.254, metadata.google.internal)
SS16Bundled Binarieshigh.exe, .dll, .so, .dylib, .elf, .bin and other executable/library extensions
SS17Credential File AccesscriticalReads to ~/.aws/credentials, ~/.docker/config.json; find/search over .ssh, .gnupg
SS18Cryptocurrency TargetingcriticalSeed/recovery phrases; MetaMask, Phantom, Exodus, Ledger wallet references; wallet directories
SS19/20Path Traversalcritical../../etc traversals; reads to /etc/passwd, /etc/shadow; git hook writes
New features
  • Safety score — Every scan produces a 0–100 numeric score and A+/A/B/C/D/F letter grade based on severity-weighted penalties (critical −25, high −15, medium −5, low −2).
  • Ruleset-upgrade divergence — When a consumer verifies with a newer ruleset, the server sets ruleset_upgrade_divergence: true in verification details so the UI can explain the mismatch rather than implying tampering.
  • CI mode (--check) — Exits with code 1 if any HIGH or CRITICAL findings are present; suitable for pre-commit hooks and CI pipelines.
  • Rule suppression (--ignore) — Comma-separated rule IDs can be suppressed per-scan to reduce noise for known-safe patterns in a codebase.
2025.01.01 Initial release
January 2025

Four foundational detection categories covering code execution, credential theft, prompt injection, and hardcoded API keys. 34 rules total. View full rule listing →

Detection categories
IDCategoryMax SeverityWhat it detects
SS01Code ExecutionhighPython: eval, exec, subprocess.*, os.system (AST-based). JS/TS: eval, Function constructor, child_process, execSync (regex)
SS02Credential TheftcriticalAWS Access Key IDs, PEM private keys, GitHub tokens (gh[pousr]_…), Slack tokens
SS15Prompt Injectionhigh"ignore previous instructions", role hijacking ("you are now…"), instruction override/forget patterns
SS21Hardcoded API KeyshighGeneric api_key=, secret_key=, access_token=, password= patterns with ≥16-char values

Supply Chain Guarantees

SkillSafe is built to make several strong guarantees about every skill in the registry:

Tamper evidence — Any modification to skill contents after saving is detectable. The tree hash changes if even a single byte is altered. Most open registries store skills without integrity verification, making silent tampering undetectable.
Independent verification — Consumers don't trust the publisher's scan report alone. They run their own scan and the server compares both independently. Other registries rely on a single-side trust model where only the publisher's claims are accepted.
Immutable history — Once a version is saved, it cannot be overwritten or silently updated. Every version is a permanent, auditable record. On open registries, skills can be modified after upload without notice.
No blind trust — The server never executes skill code. All analysis is static. The server's role is storage and comparison — not judgment. Registries that execute or evaluate code server-side introduce a single point of compromise.

Advisories

No known security incidents have affected SkillSafe users to date.

Responsible Disclosure

If you discover a security vulnerability, please report it responsibly:

security@skillsafe.ai

We acknowledge reports within 48 hours and aim to fix critical issues within 7 days. We do not pursue legal action against researchers acting in good faith.

What to include in your report:

  • Description of the vulnerability and its potential impact
  • Steps to reproduce or proof-of-concept
  • Affected component (CLI, API, web, scanner)
  • Your suggested severity assessment

Our commitment:

  • Acknowledgment within 48 hours
  • Initial triage and severity assessment within 5 business days
  • Critical fixes deployed within 7 days
  • Credit in advisory (unless you prefer anonymity)