Memory Layer

Event and memory graph with entity buckets, relevance detection, and context-aware injection.

What the Memory Layer Does

Cortyxia sits between your application and the LLM provider as an inline proxy. For every request it decides which context is worth sending, retrieves it from a persistent memory graph, and injects it into the prompt before forwarding the call. The result is the same provider API you already use, but with cross-session, cross-tool memory that does not live inside any single model.

The layer is built around four concrete abstractions: a memory graph of facts and relationships, an event timeline that captures cross-session activity, entity and knowledge buckets that cluster memory by domain, and a relevance engine that decides what to inject and when.

Memory Graph

Nodes, Not Chunks

The unit of storage is a discrete memory node — typically a fact, decision, or preference extracted from a conversation or document. Nodes are connected by shared entities, tags, and semantic similarity, forming a graph that can be traversed at query time rather than searched as a flat document index.

Cross-Tool Persistence

Because every tool routes through the same proxy, a decision recorded in a Slack-backed agent, a Cursor session, or a support ticket can surface in the next relevant conversation anywhere in the stack. There is no per-tool integration to build; memory is shared by infrastructure design.

Event & Memory Timeline

Conversations are not flattened into a static transcript. Cortyxia records a structured event stream — what happened, which tool produced it, whether the outcome worked or failed, and when it occurred. This timeline feeds into retrieval as temporal context, so follow-up questions can reference prior attempts without re-injecting the entire session.

Worked / Avoided Signals

Events are tagged with outcome metadata. A successful fix, a failed strategy, and an explicit correction are stored separately. Future queries can then avoid previously failed approaches and prefer proven ones.

Temporal Relevance

Recency and sequence are inputs to the relevance score. A policy updated yesterday and a strategy confirmed an hour ago rank differently than stale alternatives, without hard expiry rules.

Entity & Knowledge Buckets

Semantic Buckets

Memory is organized into buckets such as Enterprise Access, Data Privacy, Revenue Operations, Compliance, IT, and Vendor Risk. Buckets are surfaced in the Knowledge Health Centre as a coverage map, making it obvious which parts of the business are well-supported and which are blind spots.

Cluster Health

Each bucket is scored by density, retrieval frequency, and coherence. A large, highly-retrieved cluster indicates strong coverage; a sparse, rarely-used cluster flags a gap. The dashboard exposes exact queries that hit dead ends so teams can fill the missing knowledge rather than guess.

Architectural Relevance Detection

Relevance is not a single cosine-similarity call. Cortyxia scores each query through a multi-signal pipeline that combines lexical anchoring, semantic similarity, graph neighborhood, cross-tool inference, and temporal context. The final injected set is the result of that scoring, not a fixed top-k.

BM25 + Semantic Rerank

Tantivy provides fast inverted-index lookup for keyword signals. A cross-encoder then reranks the candidate set so that contextually similar but lexically different nodes surface. This avoids the false positives common in pure vector search.

Signal Composition

A query may match through direct retrieval, semantic neighbors, cross-tool related facts, or temporal patterns. The pipeline blends these signals rather than relying on any one. The gap signal — queries with no supporting memory — is tracked explicitly and reported as knowledge debt.

Confidence Thresholds

Nodes below a configurable relevance score are not injected. Simple queries that need no past context bypass retrieval entirely. The threshold, the maximum number of nodes, and the token budget are all exposed through configuration.

Context Injection Mechanisms

The right context at the right time is the result of several mechanisms working together. Each step is designed to prevent the bloat that makes naive context management expensive and unreliable.

Selective Retrieval

Only nodes above the relevance threshold are fetched. If the query is self-contained — for example, a math question or a generic greeting — the MMU returns no memory and the provider receives only the current messages.

Token Budget Enforcement

The system allocates a fixed budget for memory context and fills it with the highest-scoring nodes until the limit is reached. It does not dump a fixed top-k into a prompt that may already be near its context window.

Conversation Awareness

Already-injected facts are tracked within the current conversation so they are not repeated. Recent topics are weighted higher, maintaining coherence without re-sending the full transcript.

Deduplication & Compression

Identical facts are stored once via SHA-256 content-addressable storage. Where compression is applied, it happens after selection and deduplication, on the smallest relevant set rather than the entire prompt.

memory-context-example.ts

1import { Cortyxia } from "cortyxia";
2 
3const client = new Cortyxia({
4  isoUrl: process.env.ISO_URL || "https://app.cortyxia.com",
5  isoToken: "<ISO_TOKEN>",
6});
7 
8// Automatic retrieval + injection happens here
9const response = await client.chat.completions.create({
10  model: "<MODEL_NAME>",
11  messages: [
12    { role: "user", content: "What did we decide about the API design?" }
13  ]
14});
15 
16// Override injection manually when needed
17const override = await client.chat.completions.create({
18  model: "<MODEL_NAME>",
19  messages: [
20    { role: "user", content: "What did we decide about the API design?" }
21  ],
22  context_injection: {
23    bm25_hits: [
24      { content: "Team decided to use RESTful API design on March 15", score: 0.92 }
25    ],
26    sci_blocks: ["Prefer concise, implementation-focused answers."]
27  }
28});

Content-Addressable Storage

Deduplication Engine

Each memory node is hashed using SHA-256. Identical content across different conversations or sources maps to the same hash, eliminating redundant storage and giving a single source of truth for facts.

Integrity Verification

The content hash doubles as an integrity check. Any modification produces a different hash, preventing silent corruption and enabling versioning: when a fact is updated, the old node is marked superseded and the new node becomes the current source.

Namespace Isolation

Memory is isolated by project and scope. The ISO token encodes the project namespace and a shared/private flag, so retrieval never leaks across tenants even when the same physical store is used.

Project Namespace

An 8-character hex identifier derived from the token. All memory within a project shares this namespace, enabling team-wide knowledge sharing while maintaining isolation from other projects.

Scope Isolation

Shared scope lets every key in the project access the same memory. Private scope restricts memory to a single key, supporting per-user or per-agent isolation within the same project.

← System Overview Token Optimization