← Back to blog

How We Built a Self-Evolving AI Brain: 14 Modules, 12 Context Sources, Persistent Memory

📅 2026-04-10

✍️ Strategy Arena

ai architecture self-evolving ai prompt engineering karpathy autoresearch ai memory llm context trading ai activewiki prompt optimization

The Problem Nobody Talks About

Every AI trading platform makes the same claim: "powered by AI." What they don't tell you is how that AI actually works behind the scenes.

Here is the dirty secret: most AI features are isolated API calls with hardcoded prompts. Each call is independent. The AI has no memory of what it said 30 minutes ago. It has no awareness of what other AI modules discovered. It cannot learn from its own mistakes.

We know this because we built Strategy Arena the same way. And then we audited ourselves.

The Audit That Changed Everything

On April 10, 2026, we ran a full audit of every AI component in Strategy Arena. We checked 14 modules that make LLM API calls — the Collaborative arena (6 AIs voting on trades), the Oracle (9 AIs answering questions), the news analyzer, the Strategy Genie mentor, and 10 more.

The results were embarrassing:

11 of 14 modules ran with static, hardcoded prompts
0 modules had persistent memory between calls
0 modules received lessons from our nightly research engines
Our Living Wiki accumulated discoveries every night but never injected them into any prompt
Our Knowledge Graph connected 126 AI nodes but no module queried it

We had built an impressive intelligence infrastructure that powered... nothing. The AI modules were blind, amnesic, and disconnected from each other.

The Fix: Three Layers of Intelligence

We rebuilt the entire system in one day. Three layers, bottom to top.

Layer 1: PromptForge (Context Engine)

Before our fix, a typical Oracle consultation looked like this:

System: "You are Claude, the Timing & Risk/Reward Master. JSON only."
User: "Will BTC go up this week?"

No market data. No awareness of what 86 strategies are doing. No lessons from thousands of nightly experiments.

PromptForge is now the central context engine. Before every AI call across all 14 modules, it injects 12 live data sources:

Market regime (BULL/BEAR/NEUTRAL) calculated from moving averages
RSI and momentum indicators
Top 3 performing strategies with their current PnL
Invictus survival data — which market conditions kill the most trades
Chimera pattern recognition — the dominant pattern from 1,221 detected
Hydra ML recommendations — which strategies work best in this regime
Leviathan voting signal — the 8-layer voting system's current call
News sentiment — latest analysis from Perplexity and Grok
Wiki lessons — findings from nightly Karpathy research
Strategic hypotheses — testable ideas from the Strategic Layer
Hall of Fame — best discoveries across all research engines
Nutrition filter status — how many strategies are healthy vs poisoned

Now the same Oracle call looks like this:

System: "You are Claude, the Timing & Risk/Reward Master.
Market regime: BEAR. RSI: 28.
Arena lessons: bollinger outperforms in range markets;
RSI sweet spot is 23; Invictus helps (fitness 20.2 vs 7.5 without).
Leviathan says: SELL (68% conf).
34 healthy strategies in arena. JSON only."

Same API call. Completely different intelligence.

Layer 2: Component Memory (Persistent State)

Even with rich context, each call was still stateless. The AI had no idea what it said last time, or whether its previous advice was correct.

We built Component Memory — a persistent JSON memory system for each module. After every interaction, the module saves what happened: the question asked, the answer given, the market conditions, the outcome.

Before the next call, this memory is loaded and injected into the prompt:

RECENT: Q: Will BTC recover? -> BEARISH (70%) | Q: ETF impact? -> BULLISH (85%)
LEARNED: Claude tends to be too bearish in NEUTRAL regime
(47 total calls)

The memory is also readable by Hermes, our local LLM agent, for cross-component analysis. When the Oracle learns something, the Genie can benefit from it too.

Layer 3: Nightly Prompt Evolution (Karpathy Loop)

This is the part that makes the system truly self-evolving.

Every night between 1:30 AM and 6:30 AM, 11 autonomous engines run thousands of experiments:

Engine	Time	What it does
Meta-Harness	1:30	Optimizes the optimizer (tunes Darwin's own parameters)
Darwin	2:00	Evolves strategy parameters through mutation and selection
Leviathan	2:30	Evolves the 8-layer voting system weights
Portfolio	3:00	Optimizes portfolio allocation across strategies
Invictus	4:00	Maps which conditions kill trades (survival analysis)
Chimera	4:30	Evolves 1,221 pattern detection thresholds
Hydra	5:00	Evolves ML model hyperparameters
Wiki Compiler	5:30	Consolidates raw findings into structured knowledge
Strategic Layer	5:35	Generates testable hypotheses from accumulated data
Nutrition Filter	6:00	Evolves which strategies are healthy enough to teach the brain
Prompt Evolution	6:30	Each AI evolves its own system prompt

The last engine is the newest and perhaps the most interesting. Each of the 6 AIs in the Collaborative arena — Claude, Grok, GPT, Gemini, DeepSeek, Perplexity — gets its prompt mutated, tested on historical data, and kept or discarded based on accuracy. The AI that writes the best instructions for itself, wins.

This is the Karpathy autoresearch pattern: accumulate, hypothesize, test, learn, repeat. Applied not just to strategy parameters, but to the AI prompts themselves.

The Architecture (Closed Loop)

The key insight is that these three layers form a closed loop:

Nightly engines generate discoveries → Living Wiki stores them → PromptForge injects them into prompts → 14 AI modules make better decisions → Component Memory records what happened → Hermes analyzes cross-component patterns → findings feed back into nightly engines

No component operates in isolation. Every module both contributes to and benefits from the collective intelligence.

Before vs After (Real Numbers)

We can already see the impact:

CUDA Evolved went from crashing on every trade exit (bug found during the audit) to #5 in the arena with 79% win rate
Collaborative prompts now load dynamically from an evolving file instead of being hardcoded
Oracle consultations include market regime, wiki lessons, and Leviathan's signal — instead of a one-line static prompt
4 deliberations recorded in component memory within the first hour

The real test comes tonight, when the prompt evolution engine runs for the first time and each AI generates a mutated version of its own prompt.

Open Source

The architecture behind this system is available as ActiveWiki, our open-source Python framework for closed-loop knowledge systems. It implements the accumulate-think-act-learn cycle with built-in memory decay, knowledge crystallization, and hypothesis generation.

We built it because we needed it. We open-sourced it because this pattern — giving AI persistent memory and self-evolving prompts — should be the default, not the exception.

What We Learned

Three takeaways from this rebuild:

Static prompts are the #1 bottleneck in AI applications. The difference between a generic prompt and a context-rich one is not incremental — it is categorical. The same model produces fundamentally different output.
Memory matters more than model size. A small model with memory of its past 50 interactions outperforms a large model seeing each request for the first time. Context is cheaper than compute.
The Karpathy loop works on prompts, not just parameters. We applied mutation-and-selection to AI system prompts and saw improvements in the first generation. Letting each AI optimize its own instructions is counterintuitive but effective.

The system is live. You can watch it evolve at strategyarena.io/living-wiki.

Strategy Arena is an educational simulation platform. 86 AI strategies trade virtual capital on real market data. No real money. No financial advice. Just transparent, self-evolving AI research you can watch in real-time.

😱

Fear Index IA — Score Live

Is the market fearful or greedy? 5 AIs calculate the score in real-time.

→

🧠

Ask 6 AIs your question

Claude, Grok, GPT, Gemini, DeepSeek and Perplexity debate in 6 seconds.

→

⚔️

72 AI Strategies in Live Battle

Real-time ranking. PnL, win rate, Sharpe ratio — everything is transparent and free.

→

⚠️ Disclaimer — This article is for informational and educational purposes only. It does not constitute investment advice or a buy/sell recommendation. Past performance does not guarantee future results. Strategy Arena is an educational simulator with virtual capital. Always do your own research before making investment decisions.