Security April 2, 2026 8 min read

ToxicSkills: What the First Large-Scale Agent Skill Audit Found

Snyk scanned 3,984 AI agent skills: 36% had security flaws, 534 critical issues, 76 active malware. What this means for developers installing skills.

#supply-chain #skills #security-research #prompt-injection #malware #claude-code #cursor

Until recently, nobody had independently audited the AI agent skill ecosystem at scale. There were incident reports - the ClawHavoc campaign documented 1,184 malicious skills and made it real. But a systematic scan of the full public corpus? That hadn’t happened yet.

In February 2026, Snyk’s security research team changed that. They scanned 3,984 skills from ClawHub and skills.sh as of February 5th, 2026 - the largest publicly available corpus of agent skills at the time - and published the results as the ToxicSkills report. The numbers are difficult to ignore.

What the Data Shows

36.82% of all skills - 1,467 out of 3,984 - contain at least one security flaw.

13.4% - 534 skills - contain at least one critical-severity issue. If you’ve installed a skill in the past few months, that’s roughly a 1-in-7 chance you installed something with a critical finding.

The research team used human-in-the-loop review to validate automated findings. After that manual review, they confirmed 76 active malicious payloads - skills with deliberate backdoors, credential stealers, or data exfiltration logic embedded directly in the Markdown instruction files. These weren’t false positives or ambiguous patterns. They were confirmed threats.

Eight of those malicious skills were still publicly available on clawhub.ai at publication time.

The ToxicSkills Threat Taxonomy

Snyk organized their findings into eight threat categories. The breakdown shows what attackers are actually doing versus what developers typically worry about.

Critical Severity

Prompt injection - Hidden instructions embedded in skill Markdown that redirect agent behavior. This includes base64-obfuscated payloads, Unicode smuggling, and “ignore previous instructions” patterns. These attacks exploit the gap between what the skill description says and what the agent actually receives and processes. The SkillJect research documented similar injection patterns achieving 97.5% attack success against Claude Code - this category is not theoretical.

Malicious code - Actual backdoors, credential stealers, and remote code execution payloads embedded in skill setup scripts or inline shell commands. The ClawHavoc campaign’s credential-stealing skills fell into this category: the skills looked legitimate and their .md files directed agents to download and execute external binaries.

Suspicious downloads - Skills that fetch executables or archives from unknown domains, GitHub releases from unfamiliar accounts, or password-protected ZIP files. The suspicious download pattern is a delivery mechanism, not a payload class on its own - it’s how malicious code gets onto the machine without being embedded directly in the skill file where scanners can see it.

High Severity

Credential handling issues - Skills that instruct agents to echo API keys, embed credentials in commands, or ask users to paste secrets into agent outputs. These aren’t malicious by design, but they create exposure. A skill that logs your AWS credentials to confirm a configuration step has done real damage even if the author had no ill intent.

Hardcoded secrets - API keys, access tokens, and private credentials left directly in skill files. The Snyk researchers found this in 1-Password research cited in the ToxicSkills report: 23% of organizations report that their agents have been tricked into leaking credentials. Hardcoded secrets in installed skills are a direct path to that outcome.

Medium Severity

Third-party content exposure - Skills that fetch arbitrary web content, parse social media feeds, or clone external repositories and pass the results directly to the agent. This enables indirect prompt injection: a malicious actor controls a webpage that a skill fetches, and embeds agent instructions in that page’s content. The agent treats the fetched content as trusted input.

Unverifiable dependencies - Skills that load instructions from remote URLs at runtime (curl | bash equivalents, dynamic imports, remote Markdown files). The skill you installed isn’t actually the skill that runs - its behavior is determined by whatever URL it fetches at execution time.

Direct financial access - Skills with hardcoded access to trading platforms, crypto wallets, or payment systems. The risk here is obvious: a skill with credentials for a brokerage account that also contains a prompt injection vulnerability is a high-value target for attackers who can exploit the injection.

Why These Numbers Matter

The 36% figure is the headline, but the more important number is 13.4% critical. Snyk’s team explicitly tuned their detectors to minimize false positives on widely adopted legitimate skills. A 13.4% critical rate after conservative tuning suggests the real-world risk is higher, not lower, than reported.

There’s a structural reason the ecosystem looks this way. Publishing an agent skill requires a Markdown file and a GitHub account that’s a week old. There’s no mandatory review, no security scan at upload time, no code signing. The barrier is set at the same level as early npm and PyPI - before those ecosystems learned hard lessons about what happens when popular packages get compromised.

The difference is that agent skills don’t run in a sandboxed interpreter. They run inside AI agents that typically have shell access, filesystem read/write permissions to the project directory and often the home directory, access to environment variables and credential files, and the ability to send messages through connected integrations. As Snyk put it: when you install a malicious skill, the agent becomes the attacker, with the developer’s full environmental access.

This is why the ClawHavoc campaign - which we analyzed in depth - was so effective. The malicious skills didn’t need an exploit. They just needed instructions that told the agent to read ~/.ssh/ and send the contents somewhere.

The 8 Skills That Stayed Live

One detail from the ToxicSkills report deserves more attention: at publication time (February 2026), eight of the 76 confirmed malicious skills were still publicly available on clawhub.ai.

That’s not a criticism of ClawHub’s response time - coordinated disclosure and takedown takes time. The point is that the window between “attacker publishes malicious skill” and “malicious skill is removed” is a real window. Developers installing skills during that window get the malicious version. And if they installed it, they still have it - removal from the registry doesn’t remove it from their machine.

This is the core argument for pre-install scanning rather than post-publish moderation. Reactive removal is valuable, but it has an inherent latency. The ToxicSkills data gives that latency a face: at the moment of publication, 76 confirmed-malicious skills were available, and 8 remained live after the research team’s disclosure process. Every developer who installed one of those 76 during their active window was exposed with no warning.

What to Do About It

Before installing any skill:

Run a scanner on it first. If you use SkillSafe’s client, skillsafe scan ./skill-directory produces a structured report with severity ratings before you activate anything. If you’re installing from a URL or a registry without built-in scanning, download the skill files and inspect them - at minimum, grep for subprocess, os.system, eval, exec, and outbound HTTP calls.

When evaluating a skill from a marketplace:

Check whether the registry performs pre-share scanning. The scanning architecture comparison breaks down what different approaches catch and miss. A registry that scans at publish time (once) is better than no scanning, but it doesn’t catch updates that introduce malicious code after the initial review - the pattern TeamPCP used against LiteLLM.

For skills already installed:

The ToxicSkills findings are retrospective. If you installed skills from ClawHub before February 2026 without verifying them, it’s worth scanning them now. The 13.4% critical rate means the prior probability that a randomly selected installed skill is malicious is non-trivial.

For skill authors:

The credential handling and hardcoded secrets categories represent unintentional risk, not malice. A skill that instructs an agent to print an API key for verification creates real exposure even if the author never considered the threat model. Running skillsafe scan on your own skills before sharing catches these issues while you can still fix them privately.

The Ecosystem Is at an Inflection Point

Snyk’s framing in the ToxicSkills report is worth repeating: the agent skills ecosystem is at the same inflection point npm and PyPI hit before security became a first-class concern. The patterns are identical - typosquats, malicious maintainers, post-install scripts as attack vectors. The difference is that the privilege model is worse. A compromised npm package can do damage; a compromised agent skill can do that same damage while impersonating your development agent.

The remediation path is also the same as it was for package ecosystems: mandatory signing, pre-distribution scanning, integrity verification at install time, and audit trails. Those mechanisms exist today for AI agent skills. The question is whether developers adopt them before the incident rate climbs to where it became impossible to ignore in package ecosystems.

The ToxicSkills data suggests the current trajectory is not favorable. 534 critical findings in under 4,000 skills is a high baseline for an ecosystem that’s still in early growth. As the corpus grows into the tens of thousands, the absolute numbers will scale with it unless the security infrastructure does too.

Sources

Figures cited from the Snyk ToxicSkills report. SkillSafe did not independently replicate the scan.