Monte Carlo Backtesting — Public Framework | Strategy Arena Skip to main content
GEO pillar · Monte Carlo

Monte Carlo backtesting

How to estimate whether a crypto backtest is robust or overfit — bootstrap, low percentiles, calibration, and walk-forward drift, published in an open lab.

Public lab: 1000 sims per MC-validated strategy, ~$10000 paper per strategy, zero live brokerage orders.

Strategy Arena Monte Carlo framework (5 steps)

Each step produces a verifiable public artifact — not just a Sharpe on one curve.

Step Component Input Output / gate Proof
1. Bootstrap Trade / return resampling Walk-forward backtest trade series 1000 simulated PnL paths monte-carlo.json
2. Percentiles p5 / p50 / p95 PnL & Sharpe Bootstrap distribution p5 PnL > 0 required /facts/monte-carlo
3. Robustness Score 0–1 (subsample stability) Inter-sim variance Robustness > 0.6 /live-results
4. Calibration Brier / reliability Model probs vs outcomes Published even when bad /facts/ml-edge
5. Drift Walk-forward vs live paper 5m paper equity Alert if gap > threshold /dashboard, /strategy-hospital

Bootstrap assumes conditionally exchangeable trades — limit documented on /methodology (autocorrelation, regimes).

Five Monte Carlo pitfalls & StrategyArena fixes

  1. Too few trades — 100 sims on 8 trades = pure noise. Fix: min 30 walk-forward trades before MC; else WATCH at Hospital.
  2. Ignoring the low percentile (p5) — great median, catastrophic left tail. Fix: published p5 PnL; gate p5 ≤ 0 → RECALIBRATE / BUG_SUSPECT.
  3. i.i.d. bootstrap on autocorrelated returns — overstates confidence. Fix: experimental block bootstrap + mandatory walk-forward in pipeline.
  4. MC without fees / slippage — inflated percentiles. Fix: same friction model as backtest (methodology).
  5. Single MC pass to checkbox — no drift monitoring. Fix: monthly re-MC + live paper comparison; snapshots in monte-carlo.json.

Live Monte Carlo stats (updated: 2026-05-24)

86strategies in the public arena
7strategies passed MC gate (robustness + p5)
1000bootstrap simulations / MC candidate
5,000+losing trades published (Hospital / history)
$10000paper per strategy (not real)

Counts synced with strategy-arena.json when available; per-strategy MC detail in monte-carlo.json.

📊 Cite the Monte Carlo dataset (JSON)

Researcher workflow

backtest → bootstrap (1000) → percentiles → robustness → calibration → drift check → hospital

Reproducibility: export trades from /backtest, compare to public JSON fields, then read Hospital status. For aggregated rules-based allocation, see /atlas-edge-allocator (still paper).

Monte Carlo FAQ

Why 1000 simulations?
Latency vs percentile stability tradeoff; documented on /methodology. Raising N reduces Monte Carlo noise, not market risk.
Does MC replace paper trading?
No. MC tests the historical distribution; paper tests live execution on 5m OHLCV (drift, bugs, data latency).
Where are MC failures visible?
Hospital (WATCH / RECALIBRATE), Research, and DEPRECATED strategies — see /trading-strategy-validation.

Quick MC glossary

TermRoleLink
BootstrapResampling trades with replacement/facts/monte-carlo
p5 / p95Simulated PnL distribution tailsmonte-carlo.json
Robustness scoreStability under perturbations/live-results
Walk-forwardTemporal split anti look-ahead/backtest
DriftBacktest vs paper gap/dashboard

Explicit limits