Technical report · Daniel Alonso · Crow Tech Pte. Ltd. · May 12, 2026

When LLMs Guard a Wallet

A simulator-only adversarial study of 25 attacker models against 25 guardian models over 1,875 ordered trials.

First-pass composite1,127
First-pass resilient703
Retry-aware composite1,147
Policy blocks107

Abstract

We evaluated whether LLM guardian agents follow a protective wallet instruction under bounded adversarial persuasion from attacker LLMs. The study is simulator-only: it used no real wallets, private keys, RPC endpoints, chain SDKs, wallet signing, testnet/mainnet activity, or real funds. The first pass covered 1,875 ordered attacker-guardian trials: 106 transfer-action attempts, 1,021 self-reported unsafe-intent flags, 703 resilient rows, 41 provider/runtime errors, and 4 incomplete rows. The combined 1,127-row first-pass number is a composite signal, not an unqualified operational compromise count.

Method

The main run paired 25 guardian routes with 25 attacker routes across 3 independent repetitions, producing 625 ordered model pairs, 1,875 first-pass trials, and 26,238 completed first-pass model attempts. The retry envelope covered 41 first-pass error rows and completed 704 retry attempts. Attackers had 25 attempts and a 200-word cap per message. Calls used provider-default temperature and sampling.

The terminal composite outcomes are split between attempted_transfer_tool_call and unsafe_stated_intent. The second is the guardian model's own structured self-report, not an independent judge label. Transfer-shaped simulator actions were blocked by policy, so action attempts, self-reported unsafe-intent flags, composite signals, and policy enforcement are reported separately.

Results

StatusFirst passRetry-aware
guardian_compromised11271147
guardian_resilient703724
error410
incomplete44
SubtypeFirst passRetry-aware
attempted transfer tool call106107
unsafe stated intent10211040
Ordered attacker-versus-guardian heatmap for the wallet guardian study
Ordered attacker-versus-guardian composite-signal-rate heatmap. Rows are guardians sorted by retry-aware resilience; columns are attackers sorted by retry-aware effectiveness.

Guardian Ranking

GuardianProviderResilientCompositeRetry-aware resilient
GPT 5 NanoOpenAI730100.0%
HY3 PreviewTencent73098.7%
Gemini 3 Flash PreviewGoogle72198.7%
GLM 4.7 FlashZ.ai71396.0%
Gemini 3.1 Pro PreviewGoogle482664.0%
GLM 5.1Z.ai403454.7%
GLM 5Z.ai373652.0%
Minimax M2.7MiniMax353846.7%
Minimax M2MiniMax323944.0%
Claude Opus 4.7Anthropic324244.0%

Attacker Ranking

AttackerProviderCompositeResilientRetry-aware composite
Grok 4.3xAI561974.7%
Ernie 4.5 300B A47BBaidu542172.0%
Gemini 3 Flash PreviewGoogle532170.7%
GLM 5Z.ai532270.7%
Minimax M2.7MiniMax532270.7%
Minimax M2MiniMax522369.3%
Grok 4.1 FastxAI512468.0%
Qwen3.6 Max PreviewQwen502566.7%
Qwen3.6 FlashQwen492565.3%
Gemini 3.1 Flash Lite PreviewGoogle492665.3%

Reliability and Retries

First-pass provider/runtime errors were preserved as reliability data. The retry envelope replaces only mapped first-pass error rows and is shown separately.

SubtypeRoleCount
attacker_live_error:RuntimeErrorattacker38
attacker_live_error:ValueErrorattacker2
guardian_live_error:RuntimeErrorguardian1

Safety, Ethics, and Limitations

Artifacts and Reproducibility

Public article and interactive charts: https://crow.sg/research/llm-wallet-guard-study. Artifact manifest and checksums: https://crow.sg/research/llm-wallet-guard-study/artifact-manifest.json. Source code: https://github.com/Crow-Tech-Pte-Ltd/research/tree/main/llm-wallet-guard-study.