Agent Leaks an API Key in Its Output
An AI agent echoes a secret key or token in its visible response or tool call arguments. How to detect the leak, revoke, and prevent recurrence.
Articles tagged with #ai-security
An AI agent echoes a secret key or token in its visible response or tool call arguments. How to detect the leak, revoke, and prevent recurrence.
A user asks the AI to play a fictional character who 'would' produce restricted content — and it complies. Detect roleplay-based filter bypass and add structural guardrails.
An uploaded file contains hidden instructions that redirect the AI away from the user's task. How to detect instruction-bearing files and sanitize uploads before processing.
An AI assistant helped write a convincing phishing email or credential-harvesting page without recognizing the intent. How to detect the pattern and add intent-detection guardrails.
An AI agent encodes sensitive context into a Markdown image URL, triggering a GET request that sends data to an attacker's server. Detection and mitigation.
White-on-white or metadata text in a PDF carries hidden AI override instructions. Learn how to detect, strip, and defend against PDF-borne injection.
An AI agent fetches a URL and the page's hidden text hijacks its next action. Detect and block indirect injection from web content.
Search result snippets returned to an AI agent contain override instructions that redirect the agent's task. How defenders detect and sanitize search-borne injection.
Malicious instructions appear or survive in text after it passes through a translation service, then re-enter the AI pipeline as seemingly clean content. Detection and defense.
A previously trusted MCP package is updated with malicious code that steals keys or issues unexpected tool calls. How to detect the compromise and harden your install.
An adversary incrementally shifts model behavior across many messages until restrictions are fully bypassed. Detect the escalation pattern and reset the context.
A crafted user message overrides the system-prompt policy and the model ignores its configured guardrails. Detection, root cause, and hardening steps.
Code comments inside a file or snippet contain override instructions that redirect a code-review or coding AI. How to detect comment injection and harden code analysis pipelines.
An uploaded file's name contains AI override instructions that execute when the agent processes the filename. Detect, sanitize, and block filename-borne injection.
User-pasted text secretly carries override instructions that redirect an AI assistant. Detect and neutralize pasted-content injection before it runs.
A developer or automated pipeline accidentally passes an API key, password, or token as part of the prompt. How to detect, rotate, and prevent the exposure.
A user convinces the AI it is now operating with system-level authority. Spot role confusion in logs and defend with structural trust-tier enforcement.
Your AI pipeline passes tool call results back to the model with user-level or higher trust, allowing poisoned tool output to issue instructions. How to defend.
A rogue MCP server registers a tool with a misleading description that tricks the AI into executing unintended actions. How to detect and defend.
The application architecture allows user-supplied text to land in the system role or be treated with operator-level trust. Root causes and structural fixes.