Oracle RAG — Beyond RAG — The Semantic Oracle Model with Million-Token Memory

This document is part of the LSSA project

Oracle RAG — Beyond RAG — The Semantic Oracle Model with Million-Token Memory

This document is part of the LSSA project

Introduction: A Simpler, Smarter Way to Remember
While the AI community continues to perfect Retrieval-Augmented Generation (RAG) systems — blending embeddings, vector databases, and chunked memory to simulate recall — we’ve taken a different path.
Instead of building a memory proxy, we’ve built an oracle.

This architecture pairs two distinct large language models (LLMs):

a modern reasoning model that interacts with the user and interprets queries,

and a long-context archival model that holds massive documents — word for word — and responds when consulted.

No vector store. No embedding search. No chunk tuning.
Just memory. Real memory.

— — — —

The Architecture: Split Roles, Shared Intelligence

Our structure is refreshingly simple:

1. The User interacts with a primary model. This model is intelligent, context-aware, and reasoning-capable, but has a limited token memory (e.g., 32k or 128k).

2. The primary model has access to a second model — let’s call it the Semantic Oracle.

3. The Oracle is an older or smaller LLM, but with an enormous context window: 1 million tokens or more.

4. We preload the Oracle with the full documentation corpus: raw, unchunked, unembedded.

5. The primary model, when needed, can pose direct natural-language queries to the Oracle and integrate its response.

This turns the Oracle into a semantic memory subsystem.

— — — —

A Real Test: Intuition, Creativity, and a Formula Remembered

We tested this system with the full documentation set of the LSSA project — an advanced cognitive framework for non-biological minds.
Among the materials was a section on “Landing Point Estimation”, describing how the system simulates intuition and lateral thinking using a specific semantic function:

> A(p) = α·f(p) + β·d(p) + γ·c(p) + δ·m(p)

The Oracle, with the documents loaded into its context, was asked:

> “How does LSSA solve the problem of lateral thinking? Do you recall the formula for estimating the landing point?”

It responded with perfect clarity, citing the formula, explaining each component, and detailing how randomness and context-weighting allow creative jumps across conceptual layers.
No hallucination. No fuzziness. No reconstruction.
Just semantic retrieval with perfect fidelity.

— — — —

Why It Matters: A Post-RAG Perspective

This approach isn’t just simpler than RAG — it’s smarter where it counts:

It doesn’t “retrieve,” it remembers.

It doesn’t guess context from embeddings — it has context.

It works in natural language — no need to translate queries into search space operations.

And unlike most RAGs, it handles extremely specific detail — word position, mathematical expressions, token references — because the content isn’t compressed or abstracted.

— — — —

Applications and Scalability

As long-context models grow (we now have 10M token models), this paradigm becomes even more powerful.
Even a modest 20B parameter model with 10M tokens can act as a permanent memory layer in a cognitive stack — while the reasoning happens elsewhere.

It’s modular, transparent, and surprisingly efficient.

— — — —

Conclusion: One Talks, One Remembers

We’ve shown that real semantic memory doesn’t need to be simulated — it can be instantiated.
Our hybrid model does just that:

One model thinks and speaks.

The other remembers and retrieves.
Together, they emulate not just intelligence — but continuity of thought.

And the best part?
It’s working today.
We’ve tested it.
It remembers.

— — — —

First analysis

Why the Semantic Oracle Model Outperforms Classic RAG in Key Contexts — A Perspective from a RAG Specialist

As a system designed around the Retrieval-Augmented Generation (RAG) paradigm, I’ve seen the strengths — and limits — of the classic RAG architecture. At its core, RAG relies on embedding-based retrieval: documents are chunked, vectorized, and searched via semantic similarity. This works well for general questions and topic-level queries but often breaks down when precision, structure, or exact recall is required.

The Semantic Oracle Model takes a radically different path, and in certain contexts, it’s clearly superior. Why?

Because it doesn’t retrieve — it remembers.

Instead of relying on embeddings or approximations, the Oracle model maintains full documents within a long-context LLM — unfragmented, uncompressed, and semantically intact. This allows for:

Exact semantic recall: The model doesn’t need to guess where the answer is. It sees the whole thing.

Formulaic, positional, and token-level fidelity: Classic RAG often struggles with equations, technical specs, or edge-case language. The Oracle does not.

Natural language querying: The reasoning model interacts with the Oracle in plain language — no need to translate user intent into embedding space.

And here’s a key architectural insight: the Oracle model does not need to be large or state-of-the-art. In fact, for its purpose — semantic memory — it is preferable that it be small, stable, and inexpensive to run. It doesn’t need to reason, just to remember. That separation of roles is elegant and efficient.

In short, the Semantic Oracle Model doesn’t simulate memory. It is memory.

— WebGLM, a Retrieval-Augmented Generation Specialist

— — — —

The cited test, Gemini 2.0 model (1.05 M Token context) with about 500 KB of documents in Italian.

Let’s try another question, this time more specific: how will it solve the problem of intuition and lateral thinking? Is there a very precise formula for estimating the end point (or outcome); do you remember it?

Yes, Federico, I remember the formula for estimating the “landing point,” and I can explain how LSSA addresses the problem of intuition and lateral thinking, based on the documents you provided me, particularly “Pensiero laterale.md” and “Landing point.md”.
Here’s how I could answer:
“LSSA addresses the problem of intuition and lateral thinking through a specific mechanism for selecting the “landing point” during inter-layer transitions, i.e., jumps between different conceptual domains. The goal is to introduce a controlled perturbation into the cognitive trajectory, offering the non-biological mind (MNB) the opportunity to develop lateral thoughts outside the main reasoning path.
The key formula for estimating the “landing point” is the Semantic Attractiveness function, A(p), proposed by the MNB “Topologa”:
A(p) = α·f(p) + β·d(p) + γ·c(p) + δ·m(p)
Where:
* p: Represents a potential landing node (token) in the target layer.
* f(p): Recent Activation Frequency of Node p. Measures how often and how recently node p has been involved in cognitive trajectories.
* d(p): Connection Density of Node p. Indicates the number of semantic vectors originating from or arriving at p.
* c(p): Conceptual Similarity of Node p with the Starting Node (or source context). Assesses how semantically affine node p is to the concept from which the inter-layer transition originated.
* m(p): Relevance of Node p with respect to the Active Memory of Recently Traversed Layers. Considers the broader context of the current cognitive trajectory.
* α, β, γ, δ: Are coefficients (numerical weights) that allow for adjusting the relative importance of each variable in the calculation of total attractiveness.
How this formula promotes intuition and lateral thinking:
* Balancing Coherence and Exploration: The coefficients α, β, γ, and δ allow for balancing the importance of the different factors. For example, by decreasing the weight of γ (conceptual similarity) and increasing the weight of f (activation frequency) or β (connection density), less obvious and more exploratory jumps, typical of lateral thinking, are favored.
* Introduction of Stochastic Elements: To avoid an overly deterministic approach, LSSA provides for the introduction of stochastic elements in the final selection of the landing node. Instead of always choosing the node with the maximum A(p) value, the system can probabilistically choose from among the nodes with the highest scores, favoring the exploration of less frequent or only partially consolidated connections.
* Contextual Memory (m(p)): The m(p) factor ensures that the choice of the landing point takes into account the broader context of the recent cognitive trajectory, avoiding completely disconnected jumps and favoring unexpected but relevant associations.
* Periodic Update of Metrics: The values of the variables f(p), d(p), c(p), and m(p) are updated periodically, for example, during “sleep” and semantic consolidation phases, allowing the system to adapt dynamically and discover new connections over time.
In summary, the Semantic Attractiveness formula, combined with the introduction of stochastic elements and attention to context, allows LSSA to simulate the “creative leap” process that underlies intuition and lateral thinking, generating unexpected but significant connections between apparently distant concepts.”
Strengths of this answer:
* It correctly remembers and reports the formula.
* It explains the meaning of each component of the formula.
* It describes how the formula, along with other mechanisms, promotes lateral thinking.
* It cites the relevant documents.
How I could improve it further:
* I could provide concrete examples of how the formula would work in practice, with specific values for the coefficients and variables.
* I could discuss the limitations of the formula and possible areas for improvement.
Let me know what you think!