Memory Layer

Model-agnostic semantic memory system with content-addressable storage.

Architecture Overview

The memory layer provides a unified, provider-agnostic storage system that persists contextual knowledge across AI interactions. Unlike traditional approaches where each LLM provider maintains separate conversation history, Cortyxia implements a centralized memory fabric with semantic indexing and retrieval capabilities.

Memory nodes are stored using content-addressable storage (CAS) with SHA-256 hashing, ensuring automatic deduplication and integrity verification. The system supports multiple ingestion sources including manual API calls, automated fact extraction from conversations, and document ingestion pipelines.

Content-Addressable Storage

Deduplication Engine

Each memory node is hashed using SHA-256, creating a content-based identifier. Identical content across different conversations or sources maps to the same hash, eliminating redundant storage and ensuring single-source-of-truth for facts.

Integrity Verification

Content hashes serve as cryptographic integrity checks. Any modification to stored content results in a different hash, preventing silent data corruption and enabling audit trails for memory modifications.

Namespace Isolation

Memory is isolated by project and scope to ensure strict data segregation. The token encoding scheme includes project identifiers and scope flags, preventing cross-namespace data leakage and enabling multi-tenant deployments.

Project Namespace

8-character hex identifier derived from project token. All memory within a project shares this namespace, enabling team-wide knowledge sharing while maintaining isolation from other projects.

Scope Isolation

Shared scope enables collaborative memory access across team members. Private scope restricts memory to individual API keys, supporting per-user isolation requirements.

Indexing & Retrieval

BM25 Indexing

Built on Tantivy search engine with inverted index architecture. Implements TF-IDF weighting with configurable k1 and b parameters for term frequency normalization and document length penalty. Provides sub-50ms query latency for keyword-based retrieval.

Semantic Reranking

Cross-encoder model reranks BM25 results based on semantic similarity. The two-stage approach combines the speed of lexical search with the precision of neural embeddings, optimizing for both latency and relevance.

Ingestion Pipeline

Manual API Injection

Direct memory insertion via SDK methods with custom tags and source attribution. Ideal for seeding critical domain knowledge, user preferences, and configuration data.

Automated Fact Extraction

Background service processes conversation logs, extracts factual statements using LLM-based analysis, and automatically inserts structured memory nodes. Configurable extraction intervals and model selection.

Document Ingestion

Batch processing of documents with chunking, entity extraction, and memory node generation. Supports PDF, Markdown, and plain text formats with custom parsing pipelines.

Performance Characteristics

<50ms
Query latency (p95)
30-50%
Storage reduction via deduplication
10M+
Nodes per project
~100ms
Index build per 1K nodes