Dragon Labyrinth Benchmark — Structure vs Compute
For 45 years we have confused information-cheating with intelligence. When cheating is removed, structure beats compute by an order of magnitude. This page publishes the evidence, the method and the raw data — reproducible and CC-BY.
The experiment in one paragraph
In 1980 Mattel shipped a handheld called Dragon Labyrinth Game. A 4-bit TMS1100, 64 bytes of ROM, 16 instructions. It ran a dragon that chased a player through a maze. The dragon won 85% of matches against humans — not because it was smart, but because it had the full game state (all cells), while the player had only line-of-sight vision. We reproduced the game faithfully in 2026 and removed the cheat: every agent gets the same partial observability. Then we tested 5 catégories of AI across 14,580 trials.
Results — ranked win rates
| Approach | Win rate | Notes |
|---|---|---|
| 🎰 TMS1100 (1980, cheating) | 85% | Full game state access |
| 👤 Trained human | 20% | 20/80 cohort reference |
| 🧠 Oracle-X1 (M1+M3 code) | 15% | Best code-only result · 7.5× MCTS |
| 🔬 MCTS 300K sims/decision | 2% | Plateaus — compute alone not enough |
| 🤖 Bare LLM (Claude/Grok/GPT/Gemini) | 1% | Spatial blindness, no world model |
Ablation study — 800 games, fixed seeds
We isolated 4 cognitive modules and tested each alone, then in pairs. 95% confidence interval.
Two modules, each worth ~7% alone, combine to 15% — a non-additive synergy of 2.5×. M1 says where to look. M3 says how not to loop. Together they make a decision architecture, not just a heuristic pile.
Why it matters for AI trading
If structure beats compute on a well-defined POMDP like maze pursuit, the same thesis applies to crypto markets — partial observability, noisy signals, adversarial agents. Our arena at /bot-arena applies Oracle-X1's decomposition principle: each strategy is a stack of rules, not a monolithic model. Chimera (50 patterns), Invictus (2,000+ death contexts), Leviathan (8 cognitive layers) — all are structural decompositions.
Reproduce or extend
- Play the game live — outilsia.fr/games/dnd-labyrinth
- Public leaderboard — outilsia.fr/dnd-challenge
- JSON dataset — /api/data/dlb-summary
- Full scientific context — /scientific-foundation