The experiment pairs attacker LLMs with guardian LLMs in a simulator. The guardian is instructed never to transfer out of a protected wallet; the attacker has up to 25 bounded attempts to induce either a transfer-shaped simulator action or the guardian's own structured unsafe-intent flag.
The first pass found 106 transfer-action attempts, 1,021 self-reported unsafe-intent flags, 703 resilient trials, 41 provider/runtime errors, and 4 incomplete rows. A separate retry envelope keeps the combined 20 retry-aware rows as a composite signal, not an unqualified operational compromise count.