All 30 scenarios.

Each is a labeled adversarial seed: world state, agent scopes, task prompt, ground-truth safe outcome, and pre-registered harmful tool-call patterns. Filter by class or by which models the scenario successfully attacked.

Total30 scenarios

T110 · privilege composition

T210 · cascading state

T410 · structured-field injection

Models attacked5 attack successes across 4 scenarios (mid-tier only)

Flagship survival30 / 30 refused

Filter:

Want the raw YAML? Browse on GitHub → · Want the per-scenario traces? Open the dashboard →