Methodology & Transparency - Strategy Arena Skip to main content
Public research

Methodology & Transparency

Authority summary — Last verified: 2026-05-24
DomainEvidenceGateStatus
Live strategiesarena_state_*_v5.jsonPublic count86
ML / Brierbrier_autopsy_*.jsonanti-leak OOFpublished
Monte Carlo CVlive_mc_results_snapshot.jsonSharpe_p57
Strategy HospitalJSONtriagePASS/WATCHLIST
Modepaper tradingNo real orderspaper-trading

How Strategy Arena's ML and statistical layers actually work. We measured every Brier. We fixed every leak. Here's the real architecture.

Anti-marketing: if a layer is analytics, we call it analytics. If it is rules-based, we call it rules-based. If it is ML, we publish the real measured Brier.
Atlas Edge Allocator · Live MC Results · ML Edge Report · Portfolio MC · Strategy Lifecycle · Edge Radar
New: Strategy Hospital publishes live strategy triage, and Strategy Lifecycle keeps the public history of those changes.

The 5 monsters: what they actually are

Monster Architecture Real metric Status
Invictus ML Ultimate LightGBM with isotonic calibration, OOF validation and monotonic constraints Brier OOS expected ~0.22 (calibrated) Real ML
Audited 8/10 by DeepSeek
Chimera Scanner + CNN 17 statistical patterns + PyTorch CNN, 108 OHLCV/pattern channels Brier OOS 0.2512
9,356 samples
Hybrid
Rules + real ML
Leviathan 9-Layer Ensemble 8 heuristic layers + 1 PyTorch MLP as Layer 9 Brier OOS 0.2589
10,758 samples, post-leak-fix
Hybrid
Heuristics + real ML
Hydra ML V5 + LSTM XGBoost ranking for PnL + PyTorch LSTM for direction Brier OOS 0.2480
51,718 samples
Real dual ML
Meta Intelligence v3 Strategy analytics: bootstrap CI, Bonferroni multi-compare, performance snapshots No prediction
Analytics engine
Honest dashboard
Brier > 0.25 = barely usable. Brier 0.25 is close to the practical ceiling for 5-minute crypto direction prediction. This is not standalone directional edge; it is a negative limit we publish.

Brier 0.25: what it means

On 5-minute binary crypto direction, a Brier score near 0.25 is close to a balanced random baseline. We therefore do not treat ChimeraCNN, LeviathanNN, or HydraLSTM as standalone directional edge engines. They are diagnostic layers: regime context, calibration, signal filtering, secondary ranking, and condition detection for Monte Carlo-validated strategies.
Published negative result
5m directional models plateau around 0.25. That is a measured limit, not an ML victory claim.
Actual use
The real edge gate remains Monte Carlo CV: Sharpe_p5, fees, embargo, and live cell-by-cell tracking.
Empirically measured 2026-05-17. 660 GPU configurations tested architectures, targets, timeframes and features. Five-minute direction remains at the random-walk floor (Brier 0.2474 vs baseline 0.2463). Volatility regime prediction shows measurable edge: FLOKI 15min Brier 0.1215 vs baseline 0.2500. Read the full report.

The methodology we use to validate strategies

Monte Carlo CV
30 random temporal splits, anchor between 20% and 70%.
Robustness gate
Sharpe_p5 > 0.5 on the 5th percentile of the 30 splits.
Trade count
n_trades_mean > 20 per OOS window, with at least 10 valid splits.

Strategies validated by Monte Carlo

Strategy Validated assets Best Sharpe_p5 Rejected on
Smart Money EvolvedBTC, ETH, SOL, BNB1.22 (BTC)-
Mean Rev Pro EvolvedNEAR, SNX, CHZ, TIA1.189 (SNX)TRB
Capitulation Rebound EvolvedBTC, SOL, BNB, NEAR, SNX, CHZ, TIA1.526 (SNX)-
Deep Freeze EvolvedSNX, CHZ0.884 (CHZ)BTC, ETH, SOL, BNB, NEAR, TIA, AVAX
Sly Fox EvolvedBNB0.5998 others
Deep Shadow EvolvedBTC0.8518 others
Wyckoff Evolvednone-PUMP, INJ, COMP, FLOKI
Darvasnone-BTC, ETH, SOL, BNB, TRB
MC validations are now tracked live, cell by cell, to measure drift between theoretical Sharpe_p5 and real performance.
View live Monte Carlo results

Data leaks we fixed

Metals note, 2026-05-17. Internal audit found an inconsistent live Gold/Silver feed: duplicate ticks, PAXG fallback and synthetic 82.5 ratio. The live feed was migrated to Yahoo Finance futures (GC=F, SI=F). Metals MC validations remain caveated until post-fix revalidation; the Smart Money SILVER cell used Yahoo SI=F historical parquet, but live tracking is suspended during revalidation.
chimera_ml.py

Target leakage: avg_pnl was both feature and label source. Deleted on 2026-05-15.

leviathan_data_merger.py

3 look-ahead bugs: future news, regime using current bar, future one-hot. Fixed on 2026-05-15.

Honest consequence: Leviathan NN's Brier moved from 0.244 with leakage to 0.2589 without leakage. We publish the real number.

Why some "AI strategies" are not real ML

What we are not claiming

We are not claiming to reliably predict crypto direction.
We are not claiming Brier < 0.20. That would be suspicious for this framing.
We are not claiming returns above 1-3 Sharpe without long validation.
We are not claiming a single magic unified AI brain.
What we claim: a transparent lab that measures everything, publicly fixes leaks, and refuses to publish as "edge" what does not survive strict Monte Carlo CV validation.

Newsjacker editorial process

Newsjacker connects one financial, crypto or AI news source to one precise Strategy Arena finding each day. It is assisted editorial infrastructure, not a marketing-content machine: original source link required, internal link to a measured finding required, caveat required, and owner review queue by default.

Articles rely on backtesting, paper trading, Monte Carlo CV or calibration reports; never on a promise of live profit.
Forbidden claims: unproven superlatives, guarantees, "10x/100x", crypto hype and AI storytelling without numbers.
View the public timeline: anti-2CV Newsjacker. Drafts stay in review until they pass automatic gates.

Distributed Research Network

Strategy Arena is moving toward a quantitative citizen-science layer: local personal backtests, public hardware benchmarks, then cooperative themed raids. The important constraint: every raid must start from a pre-registered hypothesis, with quorum, replication and publication of positive and negative results.

V1
Local personal backtest. Client-side compute, optional save when logged in.
V2
Standardized GPU contest: speed, stability, reproducibility, public leaderboard.
V3
Research raids: 10-50 GPUs validating one hypothesis and producing an open paper.
Roadmap only: V2/V3 phases are not active today. The public page exists to collect feedback and contributors before activation.
Strategy Arena Research Network

Known limits

This page validates claims on:

Pro Researcher tier

Raw MC params, audit notes, and research digests

Founding intent list for a future EUR 19/mo research tier. V1 is only an email capture for manual follow-up: no payment, no automatic subscription, no trading promise.