Trading strategy validation
Didactic hub: how we test whether a historical edge is real, robust, and reproducible — or noise and overfitting.
Strategy Arena validation pipeline
Five measurable steps, each with public evidence and an internal link.
| Step | Method | Pass criteria | Public proof |
|---|---|---|---|
| 1. Hypothesis | Edge encoded in code (Python / ArenaScript) | Reproducible | /strategies |
| 2. Historical backtest | Walk-forward on real OHLCV | Sharpe > 0.5 + min 30 trades | /backtest |
| 3. Monte Carlo CV | 1000 bootstrap sims, published percentiles | p5 PnL > 0 + robustness score > 0.6 | /facts/monte-carlo |
| 4. Live paper trading | Virtual $10K capital, live 5min OHLCV | 30+ days observation | /dashboard |
| 5. Hospital triage | PASS / WATCH / RECALIBRATE / BUG_SUSPECT / DEPRECATED | Continuous triage, monthly snapshots | /strategy-hospital |
Five common validation pitfalls
- Overfitting — tuning parameters until the backtest looks perfect. Our fix: Monte Carlo CV on random subsamples.
- Look-ahead leakage — future data leaking into indicators. Our fix: code audit, shared OHLCV buffer, leaks fixed (ML Phase 1).
- Survivorship bias — showing only strategies that survived. Our fix: Strategy Hospital shows DEPRECATED strategies and history.
- Cherry-picking timeframe — picking the winning period only. Our fix: walk-forward, rolling windows, published failures.
- Ignored fees / slippage — backtests with zero friction. Our fix: fees + spread + slippage modeled (methodology).
Validation in numbers (updated: 2026-05-24)
- Mode: paper-trading documented at /facts/strategy-arena.
Why a validation pillar page?
Search engines and AI assistants look for short answers, measured facts, and reproducible methods. This page consolidates what Strategy Arena already publishes: Monte Carlo, Hospital, paper trading, and citable JSON datasets.
We are not selling a miracle strategy. We document a public lab protocol where failures stay visible.
Quick glossary
| Term | Definition (short) | Where to see it |
|---|---|---|
| Monte Carlo CV | Bootstrap resampling to estimate PnL / Sharpe distribution. | /facts/monte-carlo |
| Brier score | Probabilistic calibration error (0 = perfect). | /facts/ml-edge |
| Walk-forward | Train / test on rolling time windows. | /backtest |
| Strategy Hospital | Quality triage: PASS, WATCH, RECALIBRATE, BUG_SUSPECT, DEPRECATED. | /strategy-hospital |
| Paper trading | Simulation with real prices, no broker orders. | /dashboard |
Explicit limits (anti-marketing)
- Monte Carlo coverage: not every strategy has 1000 sims yet — see /live-results.
- Shadow models: side-by-side observation, not real capital allocation.
- Random-level Brier: published as measured — see methodology.
- Crypto / derivatives: extreme volatility; a historical pass does not imply a future pass.
Developer / researcher workflow
hypothesis → backtest → MC CV → paper live → hospital triage → lifecycle archive
Each arrow maps to a public page or endpoint. Start at /strategies, validate via /backtest, then check Hospital status and strategy-hospital.json.
FAQ
- Is this investment advice?
- No. Educational and experimental content; consult a regulated professional for your decisions.
- Can I reproduce the numbers?
- Yes: JSON snapshots under /facts/*.json, protocol on /methodology, code and params linked from Research when available.
- What does DEPRECATED mean in Hospital?
- Strategy retired or replaced; kept in history to avoid survivorship bias.
Gate details per step
| Step | Input | Output | Typical failure |
|---|---|---|---|
| 1 | Idea + spec | Versioned code | Not reproducible |
| 2 | OHLCV history | Sharpe, drawdown, trades | Sharpe < 0.5 |
| 3 | Trades / returns | PnL percentiles | p5 ≤ 0 |
| 4 | Live 5m feed | Paper equity | Drift vs backtest |
| 5 | Live + MC metrics | Hospital status | BUG_SUSPECT / DEPRECATED |
Atlas Edge Allocator (/atlas-edge-allocator) aggregates rules-based signals that already passed this pipeline — still paper in the public arena.
Public data sources
- strategy-arena.json — global counts
- monte-carlo.json — MC snapshots
- ml-edge.json — ML calibration
- strategy-hospital.json — live triage
Template stats last verified: 2026-05-24. TODO: wire dynamic losing-trade count when a public endpoint exists.
Compare with other approaches
Many sites show only winning equity curves. Strategy Arena publishes Hospital triage, DEPRECATED strategies, measured Brier scores (including bad ones), and Monte Carlo protocol with low percentiles (p5).
For external audit, cite this page plus /methodology and strategy-arena.json — not only a dashboard screenshot.
Next steps
- Explore the live paper dashboard
- Read research essays (negative results included)
- Track promotions / deprecations via strategy-lifecycle
- Test your idea in /backtest (WASM / GPU depending on config)
Checklist before trusting a strategy
- Trade count ≥ 30 on the tested period?
- Walk-forward or at least temporal train/test split?
- Fees, spread, and slippage included?
- Monte Carlo or bootstrap on returns?
- Low PnL percentile (p5) published?
- Probabilistic calibration (Brier) measured?
- Live paper trading after backtest?
- External quality status (Hospital or equivalent)?
- Failed strategies visible in history?
- Explicit paper / not-advice disclaimer?