⚠ NOT INVESTMENT ADVICE · Strategy Arena is an educational simulation. All strategies trade paper capital on real market data. Nothing shown here is a recommendation to buy, sell, or hold any asset.
🧪 EMPIRICAL RESEARCH

Scientific Foundation — Why Strategy Arena Is Built Like This

Every architectural choice in Strategy Arena — 60 strategies instead of 1 big model, Invictus instead of a price predictor, Leviathan instead of a meta-LLM — is the answer to a measured result on a reproducible benchmark. Here is the evidence.

The sibling benchmark: Dragon Labyrinth

In April 2026 we reproduced the Mattel D&D Computer Labyrinth (1980) — an 8×8 board with an invisible dragon driven by a 4-bit TMS1100 processor — and pitted every modern AI against it. The results, published in 3 papers on outilsia.fr, are reproducible in 40 seconds on any laptop.

This benchmark has the same structural properties as crypto trading: POMDP (partial observability), sparse reward, asymmetric information, human pattern-matching crucial. What works on Dragon Labyrinth works on live trading. What fails on Dragon Labyrinth fails on live trading.

The finding that reshaped our architecture

ApproachWin rateCompute / decision
Bare LLM (Claude Haiku)0-1%1 API call
MCTS brute force (300K sims)2%~2s GPU
Structured code (Oracle-X1, M1+M3)15%~10 ms
Trained human (reference)~20%1 sec intuition

Structured code at 10 ms beats brute-force MCTS at 2 seconds by ×7.5. The grid search across 14,580 trials confirms: no brute-force configuration exceeds 2% when replayed. Compute alone plateaus. Structure does not.

The ablation study that validates our cognitive layers

A rigorous ablation study on 800 games with fixed seeds identified the minimum cognitive scaffolding needed to beat random play:

M1 alone knows where to go but loops. M3 alone doesn't loop but doesn't know where to go. Together, they win. Full study: outilsia.fr/blog/tms1100-vs-ia-2026-ablation.

The 1:1 mapping with Strategy Arena

Every cognitive layer identified in Dragon Labyrinth has its direct equivalent in Strategy Arena. This isn't a coincidence — it's the same architecture applied to a different domain.

Dragon Labyrinth

M1 (belief state) — where is the treasure?

Strategy Arena

Chimera — 1,221 patterns, best strategy per context

Dragon Labyrinth

M3 (oscillation killer) — don't repeat mistakes

Strategy Arena

Invictus — 2,000+ death contexts veto toxic buys

Dragon Labyrinth

Prompt Layers — structured context for LLM

Strategy Arena

PromptForge — 12 context sources per decision

Dragon Labyrinth

Hybrid MCTS + Oracle-X1 (Grok proposal)

Strategy Arena

Leviathan — 8-layer weighted fusion decision

Dragon Labyrinth

Precompiled human intuition (40 years of practice)

Strategy Arena

AutoResearch — 11 nightly engines precompute priors

Dragon Labyrinth

14,580 trials → 2% brute force, 15% structured

Strategy Arena

60 small diverse strategies > 1 monolithic model

Why this matters commercially

Commercial AI trading bots (3Commas, Cryptohopper, Bitsgap) optimize for more compute — more backtests, more parameters, more ML models. Our benchmark says that direction plateaus at 2% effectiveness.

Strategy Arena optimizes for more structure — more cognitive layers, more specialized small strategies, more memory of past failures. That direction hits 15%. Same POMDP, different architecture, order-of-magnitude difference.

This is the testable proof. You can run it yourself. You can extend it. You can disprove it. The benchmark is open. The datasets are CC-BY 4.0. The ablation study reproduces in 40 seconds.

No closed commercial bot publishes anything like this. That's the moat.

Explore the applied architecture

The theory is measured. The implementation is live. Watch it run:

Strategy Arena is an educational platform. All strategies trade virtual capital on real live market data. Dragon Labyrinth Benchmark results are on a reproducible game environment, not real markets. This page documents the reasoning behind our architectural choices — it is not investment advice. Past simulated or benchmarked performance does not guarantee future results.