Public research

Methodology & Transparency

Authority summary — Last verified: 2026-06-11
Domain	Evidence	Gate	Status
Live strategies	arena_state_*_v5.json	Public count	72
ML / Brier	brier_autopsy_*.json	anti-leak OOF	published
Monte Carlo CV	live_mc_results_snapshot.json	Sharpe_p5	7
Strategy Hospital	JSON	triage	PASS/WATCHLIST
Mode	paper trading	No real orders	paper-trading

How Strategy Arena's ML and statistical layers actually work. We measured every Brier. We fixed every leak. Here's the real architecture.

Anti-marketing: if a layer is analytics, we call it analytics. If it is rules-based, we call it rules-based. If it is ML, we publish the real measured Brier.
Atlas Edge Allocator · Live MC Results · ML Edge Report · Portfolio MC · Strategy Lifecycle · Edge Radar

New: Strategy Hospital publishes live strategy triage, and Strategy Lifecycle keeps the public history of those changes.

The 4 active monsters + 1 archived finding

Monster	Architecture	Real metric	Status
Invictus ML Ultimate	LightGBM with isotonic calibration, OOF validation and monotonic constraints	Brier OOS expected ~0.22 (calibrated)	Real ML Audited 8/10 by DeepSeek
Chimera Scanner + CNN	17 statistical patterns + PyTorch CNN, 108 OHLCV/pattern channels	Brier OOS 0.2512 9,356 samples	Hybrid Rules + real ML
Leviathan 9-Layer Ensemble	8 heuristic layers + 1 PyTorch MLP as Layer 9	Brier OOS 0.2589 10,758 samples, post-leak-fix	Hybrid Heuristics + real ML
Hydra ML V5 + LSTM	XGBoost ranking for PnL + PyTorch LSTM for direction	Brier OOS 0.2480 51,718 samples	Real dual ML
Maelstrom family	Contextual bandit + strategy embeddings (V1, Gated, Minimal)	Published negative finding RF -0.26%, Hydra -2.25%, Ensemble +0.02% Brier	Scientific archive No live promotion
Meta Intelligence v3	Strategy analytics: bootstrap CI, Bonferroni multi-compare, performance snapshots	No prediction Analytics engine	Honest dashboard

Brier > 0.25 = barely usable. Brier 0.25 is close to the practical ceiling for 5-minute crypto direction prediction. This is not standalone directional edge; it is a negative limit we publish.

Brier 0.25: what it means

On 5-minute binary crypto direction, a Brier score near 0.25 is close to a balanced random baseline. We therefore do not treat ChimeraCNN, LeviathanNN, or HydraLSTM as standalone directional edge engines. They are diagnostic layers: regime context, calibration, signal filtering, secondary ranking, and condition detection for Monte Carlo-validated strategies.

Published negative result
5m directional models plateau around 0.25. That is a measured limit, not an ML victory claim.

Actual use
The real edge gate remains Monte Carlo CV: Sharpe_p5, fees, embargo, and live cell-by-cell tracking.

Empirically measured 2026-05-17. 660 GPU configurations tested architectures, targets, timeframes and features. Five-minute direction remains at the random-walk floor (Brier 0.2474 vs baseline 0.2463). Volatility regime prediction shows measurable edge: FLOKI 15min Brier 0.1215 vs baseline 0.2500. Read the full report.

The methodology we use to validate strategies

Monte Carlo CV
30 random temporal splits, anchor between 20% and 70%.

Robustness gate
Sharpe_p5 > 0.5 on the 5th percentile of the 30 splits.

Trade count
n_trades_mean > 20 per OOS window, with at least 10 valid splits.

Fees included: 0.20% round-trip.
Single-split validations are treated as weak until they survive MC CV.
Example: Wyckoff Evolved had OOS Sharpe 1.85 on a single split, then MC mean Sharpe 0.73 on PUMP, -0.04 on INJ, -0.36 on FLOKI. We rejected it.

Strategies validated by Monte Carlo

Strategy	Validated assets	Best Sharpe_p5	Rejected on
Smart Money Evolved	BTC, ETH, SOL, BNB	1.22 (BTC)	-
Mean Rev Pro Evolved	NEAR, SNX, CHZ, TIA	1.189 (SNX)	TRB
Capitulation Rebound Evolved	BTC, SOL, BNB, NEAR, SNX, CHZ, TIA	1.526 (SNX)	-
Deep Freeze Evolved	SNX, CHZ	0.884 (CHZ)	BTC, ETH, SOL, BNB, NEAR, TIA, AVAX
Sly Fox Evolved	BNB	0.599	8 others
Deep Shadow Evolved	BTC	0.851	8 others
Wyckoff Evolved	none	-	PUMP, INJ, COMP, FLOKI
Darvas	none	-	BTC, ETH, SOL, BNB, TRB

MC validations are now tracked live, cell by cell, to measure drift between theoretical Sharpe_p5 and real performance.
View live Monte Carlo results

Data leaks we fixed

Metals note, 2026-05-17. Internal audit found an inconsistent live Gold/Silver feed: duplicate ticks, PAXG fallback and synthetic 82.5 ratio. The live feed was migrated to Yahoo Finance futures (GC=F, SI=F). Metals MC validations remain caveated until post-fix revalidation; the Smart Money SILVER cell used Yahoo SI=F historical parquet, but live tracking is suspended during revalidation.

chimera_ml.py

Target leakage: avg_pnl was both feature and label source. Deleted on 2026-05-15.

leviathan_data_merger.py

3 look-ahead bugs: future news, regime using current bar, future one-hot. Fixed on 2026-05-15.

Honest consequence: Leviathan NN's Brier moved from 0.244 with leakage to 0.2589 without leakage. We publish the real number.

Why some "AI strategies" are not real ML

Leviathan 9-Layer Ensemble Brain used to be 8 heuristic layers and storytelling. After the graft, it is a 9-Layer Ensemble: 8 heuristics + 1 PyTorch MLP.
The old Chimera total was an exaggerated count from a live-accumulated brain JSON. We now display 50 peer-reviewed patterns, filtered with Bonferroni-Hochberg FDR alpha 0.05.
ML Arena V3 used to be isolated from the main monsters. The models were migrated in-place into backend/: chimera_cnn.py, leviathan_nn.py, hydra_lstm.py.

What we are not claiming

We are not claiming to reliably predict crypto direction.

We are not claiming Brier < 0.20. That would be suspicious for this framing.

We are not claiming returns above 1-3 Sharpe without long validation.

We are not claiming a single magic unified AI brain.

What we claim: a transparent lab that measures everything, publicly fixes leaks, and refuses to publish as "edge" what does not survive strict Monte Carlo CV validation.

Newsjacker editorial process

Newsjacker connects one financial, crypto or AI news source to one precise Strategy Arena finding each day. It is assisted editorial infrastructure, not a marketing-content machine: original source link required, internal link to a measured finding required, caveat required, and owner review queue by default.

Articles rely on backtesting, paper trading, Monte Carlo CV or calibration reports; never on a promise of live profit.

Forbidden claims: unproven superlatives, guarantees, "10x/100x", crypto hype and AI storytelling without numbers.

View the public timeline: anti-2CV Newsjacker. Drafts stay in review until they pass automatic gates.

Distributed Research Network

Strategy Arena is moving toward a quantitative citizen-science layer: local personal backtests, public hardware benchmarks, then cooperative themed raids. The important constraint: every raid must start from a pre-registered hypothesis, with quorum, replication and publication of positive and negative results.

V1
Local personal backtest. Client-side compute, optional save when logged in.

V2
Standardized GPU contest: speed, stability, reproducibility, public leaderboard.

V3
Research raids: 10-50 GPUs validating one hypothesis and producing an open paper.

Roadmap only: V2/V3 phases are not active today. The public page exists to collect feedback and contributors before activation.
Strategy Arena Research Network

Known limits

Brier ~0.25 on 5m crypto: practical ceiling, not standalone directional edge.
Monte Carlo CV does not guarantee future live performance.
Metals feeds: MC re-validation after Yahoo GC=F/SI=F migration.
Heterogeneous AI layers (heuristics + ML) explicitly labeled.
/facts/*.json snapshots = VPS state at generation, not HFT feed.

This page validates claims on:

/facts/strategy-arena

/facts/monte-carlo

/facts/ml-edge

/facts/strategy-hospital

Open Research Dataset

Strategy Arena publishes an anonymized public dataset of AI, ML, GPU, futures, and classic strategy paper-trading events for independent research.

Download the public dataset

Methodology & Transparency

The 4 active monsters + 1 archived finding

Brier 0.25: what it means

The methodology we use to validate strategies

Strategies validated by Monte Carlo

Data leaks we fixed

Why some "AI strategies" are not real ML

What we are not claiming

Newsjacker editorial process

Distributed Research Network

Known limits

This page validates claims on:

Open Research Dataset

Build, evolve, verify and export