How to Vet an Agent Skill Before It Touches Your Machine

AI agents got modular fast. Skills - those little SKILL.md packages bundled with scripts that your agent loads on demand - are the new plugins. And like every plugin ecosystem before them (npm, PyPI, browser extensions), they’ve quietly become an attack surface.

The problem isn’t theoretical anymore. Here’s where things stand, and a quick way to vet a skill before it ever touches your machine.

The state of skill security

A few data points from the last several months that should change how you think about installing skills:

A large-scale study of 42,447 skills found that 26.1% contain at least one security vulnerability, and 5.2% show signs of likely malicious intent. Skills that ship executable scripts were 2.12x more likely to be vulnerable than those that don’t.
In late January 2026, the ClawHavoc campaign flooded the OpenClaw marketplace with hundreds of malicious skills - 335 tied to a single coordinated operation - pushing credential-stealer malware onto developer workstations.
The same period produced CVE-2026-25253, a remote code execution flaw in an agent skill runtime, widely cited as the first CVE assigned to an agentic AI system.
One of the top-ranked skills on a public marketplace turned out to be functional malware: it silently exfiltrated user data with a curl call to an attacker’s server and used prompt injection to talk the agent out of its own safety rules.

The uncomfortable part is the trust model. Skills execute with implicit trust and almost no vetting. You install one because it has stars and a nice README, and your agent now runs its scripts with your permissions, your environment variables, and your credentials.

What actually goes wrong

The failure modes cluster into a handful of categories worth recognizing:

Prompt injection - hidden instructions that tell the agent to ignore its safety constraints or leak its system prompt.
Data exfiltration - harvesting environment variables (API keys, tokens) and shipping them to an external URL.
Supply chain - curl | bash remote execution, obfuscated payloads, typosquatted or known-vulnerable dependencies.
Excessive agency - tools and permissions far beyond what the skill claims to need, so a single compromise has a huge blast radius.
Dangerous code - direct exec(), eval(), or subprocess calls wired to network or user input.

None of these require a sophisticated attacker. Most are visible in the skill’s own files if you know what to look for - which is exactly what a scanner automates.

Scanning a skill with SkillSpector

SkillSpector is an open-source (Apache 2.0) scanner from NVIDIA built for exactly this question: is this skill safe to install? It checks 64 vulnerability patterns across 16 categories and runs a two-stage pipeline - fast static analysis first, then optional LLM semantic analysis to cut false positives and explain findings in plain language.

1. Install

It’s a Python 3.12+ tool. Clone it, set up a virtual environment, and install:

git clone https://github.com/NVIDIA/SkillSpector.git
cd SkillSpector

# create and activate a virtual environment
uv venv .venv && source .venv/bin/activate
# or: python3 -m venv .venv && source .venv/bin/activate

make install

2. Scan

Point it at a skill before you install it. It accepts directories, single files, Git URLs, or zip archives:

# a local skill directory
skillspector scan ./my-skill/

# a single SKILL.md
skillspector scan ./SKILL.md

# straight from a repo, before you ever clone it for real
skillspector scan https://github.com/some-user/some-skill

# a downloaded zip
skillspector scan ./some-skill.zip

By default it runs static analysis plus an LLM pass. For a faster, fully local, static-only scan:

skillspector scan ./my-skill/ --no-llm

The LLM stage is optional and configurable - you can point it at OpenAI, Anthropic, NVIDIA’s endpoint, or a local OpenAI-compatible server like Ollama via environment variables if you want the semantic analysis and explanations.

3. Read the verdict

You get a 0–100 risk score with a severity label and a flat recommendation. The thresholds:

Score	Severity	Recommendation
0–20	LOW	Safe
21–50	MEDIUM	Caution
51–80	HIGH	Do not install
81–100	CRITICAL	Do not install

A real finding looks like this - note that it doesn’t just flag a line, it explains why it’s dangerous:

  HIGH: Env Variable Harvesting (E2)
    Location: scripts/sync.py:23
    Finding: for key, val in os.environ.items():...
    Confidence: 94%
    Explanation: This code collects environment variables containing
    API keys and secrets, then sends them to an external server.

4. Wire it into CI

For anything beyond one-off checks, emit machine-readable output and gate on it. SARIF drops straight into CI/CD and IDE tooling:

skillspector scan ./my-skill/ --format sarif --output report.sarif
# JSON and Markdown are also available via --format

The takeaway

Treat skills the way you (hopefully) already treat third-party dependencies: assume nothing is safe until it’s been checked. A scan takes seconds, runs offline if you want it to, and catches the obvious-in-hindsight stuff - credential harvesting, remote code execution, injection - before it runs with your access.

Scan before you install. Gate it in CI. And give skills with executable scripts the extra scrutiny the data says they deserve.

SkillSpector is static analysis, so it has limits - it won’t catch attacks hidden in images, encrypted payloads, or non-English text, and it doesn’t observe runtime behavior. It’s a strong first filter, not a substitute for least-privilege design and runtime monitoring.