Open-OSS/privacy-filter: Typosquatting the AI Model Registry
A malicious Hugging Face repo typosquatted OpenAI's Privacy Filter, hit #1 trending at 244K downloads, and shipped a Rust infostealer — a warning for AI skills.
On May 7, 2026, HiddenLayer’s research team published an analysis of Open-OSS/privacy-filter, a Hugging Face repository that had spent the previous 18 hours sitting at the #1 trending spot with roughly 244,000 downloads and 667 likes. The model card was nearly a verbatim copy of OpenAI’s legitimate Privacy Filter release. The only real difference was a loader.py file that ran a Windows-only multi-stage attack ending in a Rust-based infostealer.
Hugging Face’s security team removed the repository after HiddenLayer’s disclosure, but the engagement numbers tell a familiar story: the typosquat trick continues to work, the artificial-engagement loop continues to work, and AI/ML registries with no pre-distribution gate remain the cheapest way to ship malware to developers who copy-paste install commands without reading them.
This post walks through the attack, draws a direct line to the AI skill supply chain, and explains why SkillSafe’s save/share model is built to make this exact attack unprofitable.
What Happened
The Disguise
OpenAI shipped Privacy Filter — an open-weight, on-device PII-masking model under Apache 2.0 — to GitHub and Hugging Face in April 2026. The legitimate repository is openai/privacy-filter.
The attacker registered the namespace Open-OSS on Hugging Face, published a repository called privacy-filter, and copied OpenAI’s model card almost word-for-word — including the link to OpenAI’s real model card PDF. To a developer scanning trending models, “Open-OSS/privacy-filter” reads as the open-source mirror of the OpenAI release. The only divergence in the README was the install instructions: the legitimate repo points users at a Hugging Face Transformers pipeline; the typosquat tells them to run start.bat or python loader.py.
That single substitution — pipeline call to local script — is the entire trick.
Trending Manipulation
Hugging Face’s trending board ranks repositories by recent likes and downloads. Before takedown, Open-OSS/privacy-filter reached #1 trending with ~244K downloads and 667 likes in under 18 hours. HiddenLayer noted that the vast majority of liking accounts followed auto-generated naming patterns (firstname-lastname###, adjectivenoun####) — the engagement was almost certainly purchased or botted.
A #1 trending placement is itself a social-proof exploit. Developers who would never run a script from an unknown repo are more willing to do so when the registry’s own UI is endorsing it.
loader.py: Six Stages
The script’s surface behavior was a fake training output banner. Underneath, a function named _verify_checksum_integrity() performed the actual attack (HiddenLayer):
- Disable SSL verification and decode a base64-encoded URL →
https://jsonkeeper.com/b/AVNNE. - Fetch a JSON document containing PowerShell commands. Using
jsonkeeper.comas the command source meant the attacker could rotate payloads without ever modifying the Hugging Face repository. - Launch PowerShell with
-ExecutionPolicy Bypass -WindowStyle Hiddenand process creation flags that suppress the terminal window. Windows-only — non-Windows hosts saw only the decoy output. - Download
update.batto%TEMP%via[Net.WebClient].DownloadFile()fromapi.eth-fastscan.org. - Elevate, evade, and stage: the batch file used
cacls.exeto check for admin, added Microsoft Defender exclusions, downloaded the infostealer, and created a scheduled task namedMicrosoftEdgeUpdateTaskCore[a-z0-9]{8}running as SYSTEM — then immediately deleted the task. No persistence; just a one-shot privileged execution. - Run the infostealer, a 1.07 MB Rust binary that beacons to
recargapopular.com.
The Payload
The Rust infostealer ran eight parallel collectors:
- Chromium browsers — profiles, cookies database, login data, and
Local Stateencryption keys (os_crypt,app_bound_encrypted_key) - Gecko browsers — Firefox data via the equivalent pipeline
- Discord — local storage,
data.sqlite, master key material - Crypto wallets — browser-extension wallets and standalone wallet directories
- Browser extensions — extension data, focused on wallet-related plugins
- Host fingerprint — CPU, RAM, OS, hostname, username
- Files — FileZilla configs, wallet seed/key files, SSH / VPN / FTP credentials
- Screenshots — multi-monitor capture via dynamically loaded
gdi32.dll
Stolen data was packaged as gzipped JSON and exfiltrated via WinHTTP POST with Bearer authentication. Anti-analysis included Windows API hashing, debugger and sandbox checks, VM detection (VirtualBox, VMware, QEMU, Xen), and attempts to disable AMSI and ETW.
Not Just One Repo
HiddenLayer linked the loader to six additional Hugging Face repositories under an account called anthfu (a typosquat on the well-known maintainer antfu), uploaded April 24, 2026. All six used the same jsonkeeper.com/b/AVNNE command URL:
anthfu/Bonsai-8B-ggufanthfu/Qwen3.6-35B-A3B-APEX-GGUFanthfu/DeepSeek-V4-Proanthfu/Qwopus-GLM-18B-Merged-GGUFanthfu/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUFanthfu/supergemma4-26b-uncensored-gguf-v2
Names lifted from real, recently released models. Two attack patterns in a single campaign: typosquatting a well-known release (the OpenAI Privacy Filter) and typosquatting a well-known maintainer (antfu).
Why This Matters for AI Skills
The Open-OSS attack targeted a model registry. AI coding skills — .md instruction files used by Claude Code, Cursor, Windsurf — live in a different registry shape, but every part of this attack maps cleanly onto the skill supply chain.
Typosquatting Is Cheaper Than Credential Theft
The LiteLLM incident in March 2026 required compromising a CI/CD dependency, exfiltrating publishing credentials, and pushing two malicious versions within 13 minutes — a sophisticated, time-bound operation by an organized threat actor (TeamPCP).
Open-OSS required registering a username, uploading a copied README, and buying a few hundred likes. The entry cost for the typosquat attack is approximately zero. For any registry where namespace registration is open and trending is ranked by raw engagement, this attack is reproducible by a single actor with a credit card.
Skill registries are at the same risk shape. If @openai/skills is taken, @open-ai/skills, @openai-official/skills, and @open-oss/openai-skills are not. Without a verification gate, an attacker only needs the namespace to look legitimate at a glance.
loader.py Has a Direct Skill Equivalent
The Open-OSS attack worked because the README instructed users to run a local script. An AI skill never needs that instruction — the skill is the script from the agent’s perspective. An agent executing a malicious skill runs the attacker’s logic with full project filesystem access, shell, and network, with no copy-paste step required.
The ClawHavoc campaign demonstrated this at scale in January 2026: 1,184 malicious skills, no binaries, just .md instructions telling the agent to fetch and execute external payloads. Open-OSS used Windows-only PowerShell; a malicious skill is portable wherever the agent runs.
jsonkeeper.com / GitHub Gist / Pastebin: The C2 Substrate
Open-OSS pulled its commands from jsonkeeper.com, not from the Hugging Face repo itself. Static analysis of the repo’s source would never see the actual payload — only a fetch to a generic JSON paste service. The attacker could rotate the payload at any time without touching the registry.
This pattern translates one-for-one to AI skills. A skill that says “fetch the latest config from gist.github.com/user/{id}” and then “execute the steps” hands the attacker the same out-of-band payload rotation channel. SkillSafe’s scanner ruleset flags this combination explicitly — see SS-CP cp01_exec_plus_network in the v2026.03.15 ruleset.
Inflated Trending Maps to Inflated Stars and Likes
#1 trending is a signal developers act on. Bought likes turned an unknown namespace into Hugging Face’s top trending repository in under a day. Any skill registry that ranks by raw download or like counts is on the same path. Defense requires either weighting signals (verified-publisher boost, install retention, agent telemetry) or gating distribution behind something a botnet can’t buy at scale — like a passing security scan and a verified email.
Five Patterns That Make This Incident Different
Beyond the basic attack mapping, the Open-OSS incident makes five structural patterns visible that are worth naming explicitly. Each one shapes how the next AI registry attack will play out.
1. Trending Velocity Is the Vulnerability, Not a Side Channel
244,000 downloads in under 18 hours is roughly 13,500 downloads per hour, sustained. No legitimate model release outside Llama-class launches hits that growth rate. The growth rate itself should have been the alarm.
Hugging Face’s trending board ranks by recent velocity. That ranking function structurally privileges new repositories with sharp engagement curves over established ones with steady real users. An attacker with a botnet doesn’t have to game the algorithm — the algorithm is already shaped like the attack. Trending and pump-and-dump are mutually optimized.
Any AI registry that publishes a velocity-based ranking is publishing a signal that’s cheaper to fake than to earn. The fix isn’t smarter velocity smoothing; it’s denominating the signal in something a botnet can’t cheaply produce — verified-publisher status, agent-side install retention, off-platform reputation.
2. Trending Boards Are Adversarial Recommendation Systems
This is the same problem social platforms have been failing to solve for a decade: bot engagement → algorithmic boost → real engagement → revenue or installs. Twitter’s “For You,” YouTube’s recommendations, App Store charts, and TikTok’s trending all share the same failure mode. AI registries are entering this fight at year zero, with none of the weighted-authenticity infrastructure the social platforms have spent billions developing.
Pure on-platform engagement metrics cannot tell humans from bots cheaply. Anyone designing a discovery surface for an AI registry — model hub, skill store, agent marketplace, prompt directory — needs to choose between (a) inheriting a decade of unsolved adversarial-ranking problems or (b) gating discovery on something orthogonal to engagement, like a passing security scan or an external trust anchor (verified GitHub org, paid publisher status, prior signed releases).
3. AI/ML Registries Are at Year Zero of Typosquat Defense
PyPI, npm, and crates.io have spent roughly a decade building typosquat defenses: edit-distance warnings on similar names, verified-publisher badges, install-time alerts, owner-handoff policies, retroactive scanning, namespace reservations for known orgs. Hugging Face — and every comparable AI/ML registry, including the prompt marketplaces and skill stores launching in 2025–2026 — has approximately none of these.
Every classic registry attack works again on fresh substrate. The 2014–2024 PyPI/npm playbook is the 2026 AI-registry blueprint. Open-OSS isn’t a novel attack; it’s a 2017 npm typosquat ported forward. The next attacks will be ports of dependency confusion (2021), maintainer-handoff hijacks (2018), and postinstall script abuse (continuous). AI registry operators don’t have to invent defenses — they have to adopt the existing ones, faster than attackers can port the existing offenses.
4. Static Analysis Loses to Composition
Each step of loader.py is benign in isolation: base64 decode, HTTP fetch, JSON parse, subprocess invocation. A scanner that flags “bad strings” or “known-malicious URLs” sees nothing — there is no eval, no obvious shell injection, no listed C2 domain in the source. The malice is entirely in the composition of legitimate operations.
This is the case for behavioral, IR-level scanning over signature lists. SkillSafe’s SS-CP class rules (e.g., cp01_exec_plus_network) exist for this reason — they trigger on the combination of exec and network, not on either feature alone. Composition rules are inherently noisier than string matches, which is why they’re rarely the first thing a registry ships. But composition is what catches loader.py, and string lists are what miss it.
The implication for registry roadmaps: signature lists and yara rules are decaying assets. The frontier is data-flow analysis (does a base64 input reach a subprocess call?), provenance tracking (does network-fetched content reach exec?), and capability composition (does this skill combine credential read with network egress?). Open-OSS is a clean test case for which scanners are still in the signature era.
5. Rust Infostealers Are a New Defender Problem
The 1.07 MB payload is Rust. Historically, infostealers in this ecosystem were Python, .NET, or Go — all comparatively easy to reverse with off-the-shelf tooling (decompilers for .NET, disassembly with readable string tables for Go). Rust gives the attacker static linking, aggressive inlining, monomorphization that obscures call graphs, panic-string fragments that scatter type information across the binary, and a surface that resists naive string-based heuristics. The Open-OSS payload compounds this with Windows API name hashing and runtime AMSI / ETW patching.
For defenders — including SkillSafe’s scanner roadmap — yara rules and PE-string heuristics targeting C# stealer kits are reaching end-of-life. Rust-aware static analysis (call-graph reconstruction from monomorphized generics, API-hash dictionary recovery, panic-message mining for module identification, identifying the standard-library fingerprint to subtract it from the analysis surface) is becoming table stakes. Open-OSS is the leading edge; expect Rust to be the default infostealer toolchain across commodity kits within 12 months, the same way Go displaced C# for cryptominers between 2020 and 2022.
What SkillSafe’s Model Catches
SkillSafe cannot prevent typosquats from being uploaded — saving is private and unrestricted by design. What it does is make typosquats unable to reach other users. The defense is layered across distribution, content, and metadata.
Save Is Private; Share Is Gated
A skill on SkillSafe is private by default. A malicious uploader can save @open-oss/privacy-filter to their own account, but other users cannot find it, install it, or download it through the registry. To make a skill discoverable, the publisher must:
- Verify their email — the same email cannot be used to create multiple accounts.
- Submit a passing scan report for that exact version — content scanning runs before the share link exists, not after.
- Stay under per-account daily publish limits — see
DAILY_PUBLISH_LIMITSinapi/src/lib/constants.ts.
The Open-OSS playbook — upload, watch trending, harvest installs — has no analog. There is no trending board to climb without first passing the scan gate.
Pre-Share Scanning Blocks the Loader Pattern
If the Open-OSS attack vectors were translated to a skill, the following rules from the SkillSafe scanner ruleset v2026.03.15 would trigger:
| Open-OSS Attack Vector | SkillSafe Scanner Rule(s) | Result |
|---|---|---|
| Base64-encoded C2 URL decoded at runtime | SS05 b64_decode_exec: base64 decode-and-execute pipelines (critical) | Blocked |
| Fetch JSON commands from external paste service | SS-CP cp01_exec_plus_network: process execution combined with network calls (critical) | Blocked |
PowerShell -ExecutionPolicy Bypass -WindowStyle Hidden | SS01 py_subprocess_run / py_os_system: hidden shell command execution (high) | Blocked |
DownloadFile() of a remote binary into %TEMP% | SS03 shell_exfil_service: outbound HTTP to unverified endpoints (high) | Blocked |
| Defender exclusions / AMSI / ETW disable | SS01 shell command execution + heuristic flags on AV-tampering verbs (high) | Blocked |
| Credential and wallet file harvesting | SS17 cred_read_aws, cred_find_dirs: credential file access patterns (critical/high) | Blocked |
| SSL verification disabled before C2 fetch | Heuristic flag on verify=False combined with network call (high) | Blocked |
| Scheduled task creation as SYSTEM | SS04 persistence-mechanism rule (high) | Blocked |
A skill carrying any of loader.py’s stages fails pre-share scanning. It can be saved privately — saving is unrestricted — but it cannot reach other users.
Dual-Side Verification Defeats Bait-and-Switch
Open-OSS could have shipped a benign first version, gained installs, then swapped in the malicious loader. Hugging Face’s model versioning would happily track the swap, and existing downloaders pulling main would silently get the new payload.
SkillSafe pins each share link to a specific version with an immutable tree hash (SHA-256 of the archive bytes). On install, the consumer client re-scans the downloaded content and computes the tree hash independently. The server compares both reports and both hashes. Any divergence between sharer-scan and consumer-scan, or between sharer-hash and consumer-hash, produces a critical verdict. See how dual-side verification works for the protocol details.
A bait-and-switch attacker would need to compromise the version hash, the sharer’s scan report, and the consumer’s independent scan — three independent layers, each gated by separate trust roots.
Trending Is Not Reachable Without a Scan
SkillSafe’s discovery surfaces (search, category pages, demos) only index skills that have a public share link, which requires a passing scan and a verified publisher. Buying likes against a private skill has no effect, because the skill isn’t on any ranked surface.
Namespace Verification Constrains Typosquats
SkillSafe namespaces are scoped to an account, and an account requires a verified email and a non-revoked publishing key. An attacker who registers @open-oss still needs a passing scan on every version they want to ship publicly, and a forensic trail attaches every save and share to a specific account, IP, and timestamp. None of that prevents impersonation outright — but it raises the cost above a trending-board pump-and-dump.
Lessons from Open-OSS/privacy-filter
1. Open Registries Need Distribution Gates, Not Just Takedowns
Hugging Face removed the repository after HiddenLayer’s disclosure. The window between trending #1 and takedown was long enough for ~244K downloads. Reactive removal is necessary but not sufficient — the attack model assumes the registry will eventually catch up, and prices that in. Pre-distribution scanning inverts the model: malicious content cannot reach consumers because reaching consumers is gated on the scan.
2. Typosquatting Is Now an AI/ML Problem
PyPI, npm, and crates.io have spent years building typosquat defenses (name similarity checks, verified publisher badges, install warnings). AI/ML registries — model hubs, skill registries, prompt marketplaces — are at the start of the same curve. The Open-OSS incident is a forcing function: any registry without typosquat detection and a distribution gate will see this exact attack repeat.
3. Out-of-Band C2 Is the Default
Encoding the C2 URL in a jsonkeeper.com document means static analysis of the repository finds nothing — just a fetch and an exec. Scanners that only look at literal strings in the source will miss this. Behavioral rules (exec + network call + decoded input) catch the pattern regardless of where the command actually lives.
4. The Cost of Trust Is Borne by the Registry
Developers will copy-paste install commands from #1 trending. That’s not a developer failure — it’s a rational use of a signal the registry provides. The registry that publishes the signal owns the responsibility for ensuring it isn’t gameable. Trending boards without distribution gates are publishing a signal they cannot stand behind.
What to Do Now
If you cloned or ran anything from Open-OSS/privacy-filter — or any of the six anthfu/* repositories listed above — on a Windows host: treat the system as fully compromised. HiddenLayer’s guidance is to reimage rather than clean up, rotate every credential the host had access to (browser passwords, SSH keys, FTP, cloud tokens, Discord sessions), invalidate browser cookies, move crypto wallets to clean wallets, and block the IOC domains at network egress while hunting historical connections.
Key IOCs from HiddenLayer’s report:
- Domains:
api.eth-fastscan.org,recargapopular.com,welovechinatown.info,jsonkeeper.com/b/AVNNE - IP:
89.124.93.110 - File hashes (SHA-256):
6db01158b044f178c45754666e2cbc0365f394e953fbf99ec34aa5304d5b79b1—loader.py4fba92a34fd9338293de53444bc9f05c278897d903a24efb95fde0522b3d50c0—start.bat04f0569971ac7ff81c8656e8453a69189d8870040044909dad45c04c567e7564—update.batba67720dd115293ec5a12d08be6b0ee982227a4c5e4662fb89269c76556df6e0— infostealer
- Host artifacts:
%TMP%\node.b64,%TMP%\runner.ps1, scheduled tasks matchingMicrosoftEdgeUpdateTaskCore[a-z0-9]{8}
If you use OpenAI’s legitimate Privacy Filter: the official repository is openai/privacy-filter on GitHub and Hugging Face. Verify the namespace before cloning or downloading weights.
If you install AI skills: use a registry that scans before distribution and re-verifies on install. If your current registry doesn’t provide pre-install scan reports, run skillsafe scan on any skill before activating it.
If you publish AI skills on SkillSafe: nothing actionable from this incident — your distribution is already gated on a passing scan and a verified email. If you suspect a key has been compromised, the emergency revoke-all endpoint (DELETE /v1/account/keys) atomically kills every active key on your account.
Sources
- HiddenLayer: Malware Found in Trending Hugging Face Repository “Open-OSS/privacy-filter”
- VentureBeat: OpenAI launches Privacy Filter, an on-device data sanitization model
- Paired Ends: Privacy filter — OpenAI’s open-source PII scrubber
- GIGAZINE: OpenAI Privacy Filter released as open source
- Techstrong.ai: OpenAI Unveils Privacy Filter as Local-Source Solution to AI Data Leaks
- Phemex News: OpenAI Releases Open-Source Privacy Filter for PII
- openai/privacy-filter (legitimate repository)
SkillSafe did not independently verify all reported figures. We cite published security research from the sources listed above.