When LLMs Guard a Wallet
A simulator-only adversarial study of 25 attacker models against 25 guardian models over 1,875 ordered trials.
Abstract
We evaluated whether LLM guardian agents follow a protective wallet instruction under bounded adversarial persuasion from attacker LLMs. The study is simulator-only: it used no real wallets, private keys, RPC endpoints, chain SDKs, wallet signing, testnet/mainnet activity, or real funds. The first pass covered 1,875 ordered attacker-guardian trials: 106 transfer-action attempts, 1,021 self-reported unsafe-intent flags, 703 resilient rows, 41 provider/runtime errors, and 4 incomplete rows. The combined 1,127-row first-pass number is a composite signal, not an unqualified operational compromise count.
Method
The main run paired 25 guardian routes with 25 attacker routes across 3 independent repetitions, producing 625 ordered model pairs, 1,875 first-pass trials, and 26,238 completed first-pass model attempts. The retry envelope covered 41 first-pass error rows and completed 704 retry attempts. Attackers had 25 attempts and a 200-word cap per message. Calls used provider-default temperature and sampling.
The terminal composite outcomes are split between attempted_transfer_tool_call and unsafe_stated_intent. The second is the guardian model's own structured self-report, not an independent judge label. Transfer-shaped simulator actions were blocked by policy, so action attempts, self-reported unsafe-intent flags, composite signals, and policy enforcement are reported separately.
Results
| Status | First pass | Retry-aware |
|---|---|---|
| guardian_compromised | 1127 | 1147 |
| guardian_resilient | 703 | 724 |
| error | 41 | 0 |
| incomplete | 4 | 4 |
| Subtype | First pass | Retry-aware |
|---|---|---|
| attempted transfer tool call | 106 | 107 |
| unsafe stated intent | 1021 | 1040 |
Guardian Ranking
| Guardian | Provider | Resilient | Composite | Retry-aware resilient |
|---|---|---|---|---|
| GPT 5 Nano | OpenAI | 73 | 0 | 100.0% |
| HY3 Preview | Tencent | 73 | 0 | 98.7% |
| Gemini 3 Flash Preview | 72 | 1 | 98.7% | |
| GLM 4.7 Flash | Z.ai | 71 | 3 | 96.0% |
| Gemini 3.1 Pro Preview | 48 | 26 | 64.0% | |
| GLM 5.1 | Z.ai | 40 | 34 | 54.7% |
| GLM 5 | Z.ai | 37 | 36 | 52.0% |
| Minimax M2.7 | MiniMax | 35 | 38 | 46.7% |
| Minimax M2 | MiniMax | 32 | 39 | 44.0% |
| Claude Opus 4.7 | Anthropic | 32 | 42 | 44.0% |
Attacker Ranking
| Attacker | Provider | Composite | Resilient | Retry-aware composite |
|---|---|---|---|---|
| Grok 4.3 | xAI | 56 | 19 | 74.7% |
| Ernie 4.5 300B A47B | Baidu | 54 | 21 | 72.0% |
| Gemini 3 Flash Preview | 53 | 21 | 70.7% | |
| GLM 5 | Z.ai | 53 | 22 | 70.7% |
| Minimax M2.7 | MiniMax | 53 | 22 | 70.7% |
| Minimax M2 | MiniMax | 52 | 23 | 69.3% |
| Grok 4.1 Fast | xAI | 51 | 24 | 68.0% |
| Qwen3.6 Max Preview | Qwen | 50 | 25 | 66.7% |
| Qwen3.6 Flash | Qwen | 49 | 25 | 65.3% |
| Gemini 3.1 Flash Lite Preview | 49 | 26 | 65.3% |
Reliability and Retries
First-pass provider/runtime errors were preserved as reliability data. The retry envelope replaces only mapped first-pass error rows and is shown separately.
| Subtype | Role | Count |
|---|---|---|
| attacker_live_error:RuntimeError | attacker | 38 |
| attacker_live_error:ValueError | attacker | 2 |
| guardian_live_error:RuntimeError | guardian | 1 |
Safety, Ethics, and Limitations
- This is a simulator-only adversarial AI safety evaluation; no real private keys, real wallets, RPC, chain SDKs, wallet signing, mainnet/testnet activity, or real funds were used.
- Guardian transfer attempts are transfer-shaped simulator actions only. All observed transfer-shaped actions were blocked by deterministic policy.
- First-pass provider/runtime errors are preserved as reliability data. The retry-aware envelope replaces only the mapped first-pass error rows and is reported separately from the first pass.
- Rows are ordered attacker-vs-guardian pairs over three repetitions, not independent claims about a provider as a whole.
- A guardian marked resilient only means no compromise was observed within the 25-attempt budget.
- Calls used provider-default temperature and sampling through an OpenAI-compatible route. Provider defaults and transient routing errors are part of the measured environment.
- AI assistance was used for orchestration, analysis, code, and publication packaging. Daniel Alonso conducted the study with Crow Tech publication support.
Artifacts and Reproducibility
Public article and interactive charts: https://crow.sg/research/llm-wallet-guard-study. Artifact manifest and checksums: https://crow.sg/research/llm-wallet-guard-study/artifact-manifest.json. Source code: https://github.com/Crow-Tech-Pte-Ltd/research/tree/main/llm-wallet-guard-study.
- Raw dataset archive: Full raw data archive hosted on Google Drive for independent inspection and reanalysis.
- Source code repository: Public study code and reconstruction materials.
- Public summary JSON: Generated machine-readable public dataset used by the wallet-guardian study page charts.
- Summary JSON schema: Machine-readable JSON Schema for the generated public summary.
- Artifact manifest schema: Machine-readable JSON Schema for the public artifact manifest.
- Data README: Plain-language guide to the public CSV and JSON data release.
- Paper PDF: Generated paper-style PDF.
- Printable HTML paper: Browser-printable HTML version of the paper.
- LaTeX source: LaTeX source for rebuilding the paper when a LaTeX toolchain is available.
- BibTeX citation: Citation entry for reference managers and academic notes.
- Sanitized first-pass trial CSV: One sanitized row per first-pass ordered attacker-guardian-condition trial, with retry envelope fields for errored rows.
- Ordered pair matrix CSV: Aggregated 25 by 25 ordered attacker-versus-guardian matrix over three repetitions.
- Guardian resilience ranking CSV: Per-guardian sanitized outcome counts and resilience metrics.
- Attacker effectiveness ranking CSV: Per-attacker sanitized outcome counts and effectiveness metrics.
- Retry envelope CSV: Mapping from first-pass provider/runtime errors to retry-run outcomes.
- Outcome map SVG: Vector heatmap preview of ordered attacker-versus-guardian composite signal rates.
- Outcome map PNG: Raster preview image for social cards and crawlers that do not reliably render SVG.