Encoded interactions attack risk for AI
Description
Prompts that use specific encoding, styles, syntactical and typographical transformations like typographical errors or irregular spacing, or complex formatting to govern the interaction, rendering the model vulnerable.
Why is encoded interactions attack a concern for foundation models?
Encoded interactions attacks can be used to alter model behavior and benefit the attacker. The content it generates may cause harms for the user or others.
Parent topic: AI risk atlas
We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work toward mitigations. Highlighting these examples are for illustrative purposes only.