\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{enumitem}

\title{ActiveWiki: A RAG-Augmented POMDP Framework for Cognitive Benchmarking}
\author{Strategy Arena Research}
\date{May 17, 2026}

\begin{document}
\maketitle

\begin{abstract}
We introduce ActiveWiki, a retrieval-augmented controller for partially observable decision processes (POMDPs) motivated by the Dragon Labyrinth benchmark. The method records solved trajectories, clusters them into reusable state patterns, and injects nearest-case memory at decision time. On a 1,000-game Dragon Labyrinth validation slice, the existing structured controller Oracle-X1 wins 142 games, while Oracle-X2 ActiveWiki wins 138. The systems overlap on 65 wins, but ActiveWiki uniquely solves 73 games that Oracle-X1 misses; their union reaches 215 wins. These results suggest that case retrieval is not a replacement for explicit structure, but a complementary cognitive substrate for hidden-state tasks where topology and local context interact.
\end{abstract}

\section{Motivation}
The Dragon Labyrinth benchmark was designed to test whether explicit structure can outperform brute-force compute in a partially observable environment. Prior work showed that a compact structured agent could outperform pure search and frontier language models. ActiveWiki asks a narrower question: can a wiki of solved cases improve decision quality when the current state resembles a historical trap, corridor, or transition pattern?

\section{Method}
ActiveWiki has three stages. First, it logs successful and failed trajectories across 5,000 training games. Second, it derives topology-aware features and clusters cases into eight recurring state families. Third, at runtime, the controller retrieves nearest cases and injects a compact recommendation into the existing decision stack.

\section{Validation}
\begin{table}[h]
\centering
\begin{tabular}{lrr}
\toprule
System & Wins / 1000 & Win rate \\
\midrule
Oracle-X1 structured controller & 142 & 14.2\% \\
Oracle-X2 ActiveWiki & 138 & 13.8\% \\
Both systems win & 65 & 6.5\% \\
Oracle-X1 only & 77 & 7.7\% \\
Oracle-X2 only & 73 & 7.3\% \\
Union selector & 215 & 21.5\% \\
\bottomrule
\end{tabular}
\caption{Dragon Labyrinth validation summary.}
\end{table}

The key result is not that ActiveWiki beats Oracle-X1 in isolation. It does not. The key result is complementarity: 73 unique ActiveWiki wins indicate that retrieval captures cases the structured controller misses.

\section{Interpretation}
ActiveWiki supports a pragmatic hypothesis for cognitive benchmarking: hidden-state tasks may need both structured state estimation and case memory. A POMDP agent that only searches can miss known failure motifs, while a pure memory system can overfit. A hybrid controller can select between them when the retrieved case is sufficiently similar and empirically reliable.

\section{Limitations}
The current evidence comes from a single benchmark family. The selector policy is still manual and should be replaced by a calibrated gate. Future work should test transfer to other POMDP environments, add confidence calibration, and evaluate whether retrieval improves sample efficiency without introducing brittle shortcut behavior.

\section{Conclusion}
ActiveWiki is best understood as a retrieval layer for cognitive control, not as an end-to-end replacement for reasoning. Its main contribution is empirical: in Dragon Labyrinth, case memory and explicit structure solve partially different games. That complementarity is the operational signal worth generalizing.

\bibliographystyle{plain}
\bibliography{references}
\end{document}
