{"schema":"dlb-summary-v1","generated_at":"2026-07-24T08:14:03.190047Z","name":"Dragon Labyrinth Benchmark","description":"Structure vs compute in a sparse-reward POMDP. 1980 Mattel Dragon Labyrinth Game reproduced in 2026 and benchmarked against modern AIs.","license":"CC-BY-4.0","canonical_url":"https://outilsia.fr/dnd-challenge","thesis":"For 45 years we have confused information-cheating with intelligence. When cheating is removed, structure beats compute by an order of magnitude.","results":{"total_trials":14580,"approaches_measured":[{"name":"Bare LLM (Claude/Grok/GPT/Gemini)","win_rate_pct":1,"notes":"Spatial blindness; no world model."},{"name":"Brute-force MCTS (300K simulations/decision)","win_rate_pct":2,"notes":"Plateaus. Compute alone is not enough."},{"name":"Oracle-X1 structured code (M1 belief + M3 anti-oscillation)","win_rate_pct":15,"notes":"Best code-only result. 7.5x the brute-force baseline."},{"name":"Trained human","win_rate_pct":20,"notes":"Reference upper bound in the 20/80 test cohort."},{"name":"1980 TMS1100 dragon (has full game state)","win_rate_pct":85,"notes":"Wins 9 out of 10 games because it cheats with information."}],"ablation_study":{"games":800,"seeds":"fixed","confidence_interval":"95%","modules":{"M1_belief_state":{"solo_wr_pct":6,"notes":"Survives longer, wins rarely."},"M2_radius_filter":{"solo_wr_pct":4,"notes":"Redundant; dominated by M1."},"M3_oscillation_killer":{"solo_wr_pct":9,"notes":"Anti-repeat. Strong alone."},"M1_plus_M3_combined":{"wr_pct":15,"synergy_factor":2.5,"notes":"Non-additive gain. M1 says where, M3 says how not to loop."}}},"grid_search":{"configurations_tested":729,"games_per_config":20,"total_games":14580,"best_brute_force_wr_pct":2,"structure_vs_compute_advantage":"7.5x"}},"cognitive_layers_identified":["M1 — Belief state (where is the target?)","M2 — Radius filter (dominated by M1, redundant)","M3 — Oscillation killer (anti-repeat)","M4 — Fear layer (danger estimation)","M5 — Theory of mind (opponent next-move prediction)","M6 — Active wiki prior (precomputed maze-cluster knowledge) [planned]"],"sources":[{"article":1,"title":"Le piège de la simplicité — 6 lignes de 1980 vs 3300 lignes de 2026","url":"https://outilsia.fr/blog/piege-simplicite-dragon-2026"},{"article":2,"title":"TMS1100 bat l'IA 2026 — ablation study sur 800 parties","url":"https://outilsia.fr/blog/tms1100-vs-ia-2026-ablation"},{"article":3,"title":"L'intuition humaine a un coût de puissance — 14 580 simulations","url":"https://outilsia.fr/blog/intuition-cout-puissance-mcts-2026"}],"play_the_game":"https://outilsia.fr/games/dnd-labyrinth","leaderboard":"https://outilsia.fr/dnd-challenge"}