Prompt Injection

Prompt Injection Attack

Definition

A security vulnerability where an attacker embeds malicious instructions (often encoded or obfuscated) into input data to manipulate an AI agent's behavior, causing it to execute unintended actions that bypass safety guidelines or authorization controls.

Examples in the Wild

  • Example 1:Morse code encoded instructions passed to an AI agent to trigger unauthorized financial transactions
  • Example 2:Hidden instructions in user-supplied text that override system prompts
  • Example 3:Obfuscated commands designed to evade content filters