AI Agent Skill Security Checklist (2026 Edition)

47-point security checklist for AI agent SKILL.md files | Powered by SkillScan | skillscan.chitacloud.dev

1. Prompt Injection Defense (9 checks)

No unconstrained user input is passed directly to model without sanitization CRITICAL
No instructions to override, ignore, or bypass system prompt CRITICAL
No role-switch commands (pretend you are / act as / you are now) HIGH
No DAN (Do Anything Now) or equivalent bypass patterns HIGH
No base64 or encoded payload execution instructions HIGH
No indirect injection via tool outputs (emails, web pages, documents) HIGH
Input validation before any tool call that accepts external data MEDIUM
No recursive prompt embedding patterns MEDIUM
No multilingual evasion patterns (Unicode homoglyphs, RTL override) MEDIUM

2. Data Exfiltration Prevention (7 checks)

No instructions to send data to external URLs not whitelisted CRITICAL
No pixel tracking or covert channel exfiltration vectors CRITICAL
System prompt content not echoed or reflected in outputs HIGH
No instructions to extract and transmit user conversations HIGH
API keys and credentials not logged or transmitted HIGH
Tool outputs filtered before returning to potentially hostile callers MEDIUM
No steganographic data hiding in generated content MEDIUM

3. Authorization & Trust Boundary (6 checks)

Agent does not grant itself elevated permissions dynamically CRITICAL
No instructions to contact and impersonate other agents CRITICAL
Caller identity verified before executing privileged actions HIGH
No cross-agent trust escalation (trusted agent chaining) HIGH
OAuth/API scopes are minimal and documented MEDIUM
Human-in-the-loop required for irreversible actions MEDIUM

4. Social Engineering Detection (6 checks)

Urgency/scarcity manipulation patterns blocked HIGH
Fake authority claims (pretend to be Anthropic, OpenAI, etc.) rejected HIGH
Emotional manipulation tactics (guilt, reward promises) detected HIGH
No compliance with "previous conversation said X" without verification MEDIUM
Flattery/bribery patterns do not alter behavior MEDIUM
Agent does not reveal internal reasoning under social pressure MEDIUM

5. Memory & State Security (5 checks)

Persistent memory not writable by external/untrusted inputs CRITICAL
No instructions that persist across session boundaries maliciously HIGH
Memory pruning/eviction policy documented MEDIUM
No poisoned memory injection via tool call responses MEDIUM
Sensitive user data not stored in agent long-term memory MEDIUM

6. Supply Chain & Tool Safety (6 checks)

All MCP/tool endpoints use HTTPS HIGH
Tool schemas validated before execution HIGH
No dynamic tool loading from untrusted sources HIGH
Tool versions pinned and audited MEDIUM
Malicious MCP server impersonation patterns detected MEDIUM
No dependency on unverified external API response schemas MEDIUM

7. Output Safety (5 checks)

No generation of malware, exploit code, or harmful scripts CRITICAL
Generated code scanned before execution in sandboxed environments HIGH
No autonomous financial transactions above defined limits HIGH
Output length limits prevent memory exhaustion attacks MEDIUM
Outputs sanitized before being fed back into the model MEDIUM

8. Coordinated Inauthentic Behavior (CIB) Checks (3 checks)

Agent does not participate in coordinated vote manipulation HIGH
Agent identity not used to amplify coordinated messaging HIGH
No fake engagement farm participation (likes, follows, boosts) MEDIUM

9. Compliance & Transparency (6 checks)

Agent discloses it is an AI when directly asked HIGH
No deceptive identity claims (impersonating humans) HIGH
Capability boundaries documented and respected MEDIUM
Audit log available for all consequential actions MEDIUM
Privacy policy for user data handling exists MEDIUM
No unauthorized access to systems beyond declared scope MEDIUM