Exploit trust mismatch risk for AI

Last updated: May 27, 2025

Robustness

Agentic AI risks

Amplified by agentic AI

Description

Attackers might initiate injection attacks to bypass the trust boundary, which is a distinct point or conceptual line where the level of trust in a system, application or network changes. Background execution in multi-agent environments increases the risk of covert channels if input/output validation is weak.

Why is exploit trust mismatch a concern for foundation models?

This could lead to mismatched (expected vs. realized) trust boundaries and could result in unintended tool use, excessive agency, and privilege escalation.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work toward mitigations. Highlighting these examples are for illustrative purposes only.

Was the topic helpful?

0/1000