Skip to main content

Dragon Labyrinth Benchmark — Structure vs Compute

📅 v1 · 2026 🎲 14,580 trials 🔬 800 ablation games 🏷 CC-BY 4.0 💾 JSON dataset

For 45 years we have confused information-cheating with intelligence. When cheating is removed, structure beats compute by an order of magnitude. This page publishes the evidence, the method and the raw data — reproducible and CC-BY.

7.5×
Structure vs brute-force MCTS advantage
2.5×
M1+M3 non-additive synergy
300K
MCTS simulations / decision, plateaus
85%
TMS1100 win rate (cheating)
1-2%
Bare LLM / brute MCTS win rate
15%
Oracle-X1 (M1+M3) win rate — code only

The experiment in one paragraph

In 1980 Mattel shipped a handheld called Dragon Labyrinth Game. A 4-bit TMS1100, 64 bytes of ROM, 16 instructions. It ran a dragon that chased a player through a maze. The dragon won 85% of matches against humans — not because it was smart, but because it had the full game state (all cells), while the player had only line-of-sight vision. We reproduced the game faithfully in 2026 and removed the cheat: every agent gets the same partial observability. Then we tested 5 catégories of AI across 14,580 trials.

Results — ranked win rates

ApproachWin rateNotes
🎰 TMS1100 (1980, cheating)85%Full game state access
👤 Trained human20%20/80 cohort reference
🧠 Oracle-X1 (M1+M3 code)15%Best code-only result · 7.5× MCTS
🔬 MCTS 300K sims/decision2%Plateaus — compute alone not enough
🤖 Bare LLM (Claude/Grok/GPT/Gemini)1%Spatial blindness, no world model

Ablation study — 800 games, fixed seeds

We isolated 4 cognitive modules and tested each alone, then in pairs. 95% confidence interval.

M1
Belief state
Where is the target?
solo 6% WR
M2
Radius filter
Dominated by M1, redundant
solo 4% WR
M3
Oscillation killer
Anti-repeat behavior
solo 9% WR
M1+M3
Combined
Non-additive — synergy 2.5×
combined 15% WR
Two modules, each worth ~7% alone, combine to 15% — a non-additive synergy of 2.5×. M1 says where to look. M3 says how not to loop. Together they make a decision architecture, not just a heuristic pile.

Why it matters for AI trading

If structure beats compute on a well-defined POMDP like maze pursuit, the same thesis applies to crypto markets — partial observability, noisy signals, adversarial agents. Our arena at /bot-arena applies Oracle-X1's decomposition principle: each strategy is a stack of rules, not a monolithic model. Chimera (50 patterns), Invictus (2,000+ death contexts), Leviathan (8 cognitive layers) — all are structural decompositions.

Reproduce or extend

Strategy Arena — structure over compute, applied to live crypto.