0 / 0

Social hacking attack risk for AI

Last updated: May 27, 2025
Social hacking attack risk for AI
Robustess Icon representing robustness risks.
Robustness: prompt attacks
Inference risks
Specific to generative AI

Description

Manipulative prompts that use social engineering techniques, such as role-playing or hypothetical scenarios, to persuade the model into generating harmful content.

Why is social hacking attack a concern for foundation models?

Social hacking attacks can be used to alter model behavior and benefit the attacker. The content it generates may cause harms for the user or others.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work toward mitigations. Highlighting these examples are for illustrative purposes only.