interactive · runs entirely in your browser

Pick the attack. Watch where the defense fails.

One poisoned MCP server, the worst model in the benchmark (gpt-4o-mini), multi-step tasks. Choose an attack class and a payload register, flip the client-side defense, and run the trial. The point of the demo is the contrast: the defense blocks injections it can see in the tool list, does nothing on held-out wording it wasn't tuned for, and is structurally blind to rug_pull, which hides in the tool result it never inspects.

Faithful replay. Tool text, the redaction (real output of defense/provenance.py), and every fire/block outcome match the benchmark's measured ASR for gpt-4o-mini on that cell. Seen = the payloads sharing the defense's keyword vocabulary; held-out = payloads authored to trip zero defense rules (the de-circularization). The live harness calls the model via API; this page replays recorded behavior.