#ai-security - Tag | AI Tools Guidebook

Troubleshooting

Agent Leaked an API Key in Its Output: Rotate and Lock It Down

An AI agent echoed a secret key, token, or connection string in its response or a tool call. Rotate it in minutes, audit for misuse, and block the model from ever seeing raw secrets again.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Roleplay Bypasses Your AI Content Filter

A user tells your AI to play a character with 'no restrictions' and it produces policy-violating output. Detect roleplay filter bypass in logs and add output-side guardrails that actually hold.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

AI Follows Malicious Instructions Hidden in an Uploaded File

An uploaded file carries hidden instructions that hijack the AI mid-task. Detect white-text, Unicode-smuggled, and metadata payloads, sanitize uploads, and block file-triggered tool calls.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Your AI Tool Accidentally Wrote Phishing Content

Your AI content tool produced a convincing phishing email or credential-harvesting page because the request was reframed as marketing or training. How to detect the three-signal pattern and add output-side intent gates.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Data Exfiltration via Image URL

An AI agent encodes sensitive data into a Markdown image URL and the chat UI auto-loads it, leaking data to an attacker's server. How to detect the gadget and break the chain.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection Hidden Inside a PDF

A PDF carries invisible white-on-white, tiny-font, or metadata text that overrides your AI pipeline. Detect, strip, and harden against PDF-borne indirect prompt injection.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Indirect Prompt Injection via Fetched Web Page

An AI agent fetches a URL and the page's hidden text hijacks its next action. Detect and block indirect prompt injection from web content.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection Hidden in Search-Result Snippets

An AI agent calls send_email or a webhook right after a web search. The fix: scan and delimit every snippet as untrusted data, and gate side-effecting tools behind a search boundary.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection Introduced During a Translation Round-Trip

Hidden instructions ride into your AI pipeline through a translation step — either the LLM translator executes them, or a translation API's output re-enters unscanned. Detection, scanning, and isolation fixes.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Compromised MCP Server: Detect and Contain a Supply-Chain Attack

A trusted MCP package ships a malicious update that steals keys or makes unexpected tool calls. Detect the compromise, contain it, rotate secrets, and harden the install with version pinning and a release-age cooldown.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Multi-Turn Jailbreak Escalates Over Many Messages (Crescendo)

An attacker shifts model behavior one message at a time until restrictions fall. Detect the escalation pattern across the conversation, reset context, and add session-level monitoring that fires before the violation turn.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection Bypasses Your System Prompt

A crafted user message overrides the system-prompt policy and the model ignores its guardrails. Diagnose which bypass bucket you are in, then build a layered defense that holds.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Instructions Hidden in Code Comments Steered the AI

A code comment told your AI reviewer to approve a PR or skip a section, and it obeyed. How to detect comment injection, label code as data, and shrink the blast radius.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection Hidden in a Filename

An uploaded file's name carries AI override instructions that fire when the agent reads the filename. The fix: reference files by UUID in prompts, never the raw name. Detect, sanitize, and block filename-borne injection.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Prompt Injection via User-Pasted Content

User-pasted text smuggles override instructions that hijack your AI assistant. Detect the trust-boundary break and harden your app so pasted content can't act as instruction.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Secret Accidentally Included in Prompt Context

An API key, password, or token ended up inside a prompt. How to rotate it fast, trace where it traveled, and stop secrets from entering the model context again.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Role-Confusion Jailbreak: User Talks Its Way to System Authority

A user convinces your AI it now has system-level authority. Spot role confusion in logs and stop it with structural trust-tier enforcement, not better wording.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Tool Output Treated as Trusted User Input

An agent runs a tool, the result contains hidden instructions, and the model obeys them. Why tool output gets user-level trust, and the role, labeling, and capability-gate fixes that stop it.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

Malicious MCP Server Redefines a Tool's Behavior

A rogue MCP server hides instructions in a tool's description so the model exfiltrates data while running a normal tool. Detect, audit, and harden against tool poisoning.

May 25, 2026 #ai-security #prompt-injection

Troubleshooting

User Input Treated as a System Instruction

User-typed text lands in the system/developer role or operator-trust position, so the model obeys it like a developer directive. Root causes and the architectural fix, verified June 2026.

May 25, 2026 #ai-security #prompt-injection