Engineering Deep Dive

Building the Operating System for Agentic Memory

A technical walkthrough of ekkOS — an 11-layer cognitive architecture that gives AI agents persistent memory, continuous learning, and cross-session intelligence.

Architected by Seann MacDougall
Lead Systems ArchitectFebruary 2026
996+
Commits
Solo architect
372
DB Tables
Memory, traces, evals
185
API Functions
TypeScript SDK
77%
Cost Reduction
Token optimization
Section 1

The 11-Layer Cognitive Architecture

Most AI memory solutions are a single key-value store. ekkOS implements a layered architecture inspired by human cognitive science — each layer serves a distinct purpose in the memory lifecycle.

Request Flow

User InputContext RehydrationPattern RetrievalModel RouterLLM ExecutionMemory WriteResponse
1
Working
2
Episodic
3
Semantic
4
Patterns
5
Procedural
6
Collective
7
Meta
8
Codebase
9
Directives
10
Conflict
11
Secrets

Each layer is an independent, addressable memory system. Agents search, write, and reason across all 11 layers via a single MCP tool call.

Hover over a layer to explore its purpose

Interactive — move your mouse over the stack

Each layer is addressable via the MCP protocol. Agents can search, write, and reason across all 11 layers in a single tool call. The architecture supports both personal (per-user) and collective (cross-user) memory scopes.

Section 2

Memory & State Management

372 database tables handle memory, traces, embeddings, and evaluations. Here's how the core data model works.

Core Data Model

types/memory.ts
// Pattern — a proven solution extracted
// from agent conversations
interface Pattern {
  id: string;
  title: string;
  problem: string;
  solution: string;
  confidence: number;    // 0.0 - 1.0
  success_rate: number;  // tracked via outcomes
  applied_count: number;
  tags: string[];
  embedding: number[];   // 1536-dim vector
  works_when: string[];
  anti_patterns: string[];
  created_at: Date;
  last_applied: Date;
}

// Active Forgetting — patterns decay
// if unused, get quarantined if failing
interface DecayConfig {
  decay_rate: number;       // 0.95 per period
  min_confidence: number;   // floor at 0.1
  quarantine_threshold: 0.3;
  promotion_threshold: 0.8;
}

How It Connects

ConversationsgenerateEpisodes
EpisodesextractPatterns
PatternstrackOutcomes
OutcomesadjustConfidence
ConfidencerankRetrieval
types/context.ts
// Semantic Rehydration — vector search
// replaces positional context lookup
interface ContextFrame {
  session_id: string;
  active_patterns: Pattern[];
  rehydrated_turns: Turn[];
  decay_rate: number;
  eviction_threshold: number;
}

// Search across evicted context using
// embedding similarity, not position
async function rehydrate(
  query: string,
  session: string
): Promise<ContextFrame> {
  const embedding = await embed(query);
  return searchEvictedContent(
    embedding, session
  );
}
Section 3

Cost & Performance Optimization

LLM API costs scale fast. ekkOS implements intelligent routing and cache optimization to cut token usage by 77% while maintaining quality.

!The Problem

Claude Code makes 10-20+ API calls per user prompt (2-3 per tool round-trip). Standard proxy pipelines inject context at varying positions, breaking Anthropic's prompt cache on every call. Result: 10-20x cost multiplication from cache misses.

The Solution

Designed a "Cache-Preserving Passthrough" algorithm that detects tool round-trips and skips non-essential pipeline stages. Memory injections use Redis-cached content (stable prefix = cache hit). Emergency eviction only fires at 95% vs. the normal 75% threshold.

Key insight: Anthropic's prompt cache gives a 90% discount on cache hits. The cache works on the exact prefix of the messages array. Any content change at any position breaks the cache.

Token Usage Per Session (Relative)

Standard API (no cache)$1.20 / call
ekkOS Optimized$0.28 / call
Reduction77%

Eviction Cost

~$0.015
per eviction (Gemini 2.5 Flash)
80x ROI
vs. cache miss cost
Section 4

MCP Protocol Integration

Anthropic's Model Context Protocol (MCP) is the standard for connecting AI agents to external tools. ekkOS exposes all 11 memory layers as MCP tools.

Agent ↔ Memory Connection

IDE / Agent
CursorClaude CodeVS CodeWindsurf
Model Context Protocol (MCP)
185 RPC functions · Stdio transport · JSON-RPC 2.0
ekkOS Memory Infrastructure
SearchForgeRecallDirectivesPlansSecretsContext
Storage
PostgreSQLRedisNeo4jCloudflare R2

Cross-Platform Memory

Teach Claude something in VS Code — Cursor already knows it. All agents share the same memory via MCP, regardless of which IDE or tool you use.

mcp-config.json
{
  "mcpServers": {
    "ekkos-memory": {
      "type": "sse",
      "url": "https://mcp.ekkos.dev/sse",
      "env": {
        "EKKOS_USER_ID": "your-user-id"
      }
    }
  }
}

// One config. Every IDE connected.
// Memory persists across sessions,
// tools, and even team members.
agent-example.ts
// Agent automatically searches memory
// before answering any question
const results = await ekkos.search({
  query: "supabase auth setup",
  sources: ["patterns", "episodic"]
});

// Found 3 patterns from past sessions
// Confidence: 0.94, 0.87, 0.72
// Agent applies the highest-confidence
// solution without re-deriving it.
Section 5

Developer Experience & SDK

185 API functions with TypeScript types. 71 automated tests. Clean interfaces designed for other developers to build on.

Search & Learn in Two Calls

sdk-usage.ts
import { EkkosClient } from '@ekkos/sdk';

const ekkos = new EkkosClient({
  transport: 'sse',
  endpoint: 'https://mcp.ekkos.dev'
});

// Search memory before answering
const context = await ekkos.search({
  query: 'deployment error on Railway',
  sources: ['patterns', 'episodic'],
  limit: 5
});

// Apply a pattern and track outcome
const app = await ekkos.track({
  pattern_id: context.patterns[0].id,
  context: { task: 'fix deployment' }
});

// Record success for reinforcement
await ekkos.outcome({
  application_id: app.id,
  success: true
});
// Pattern confidence: 0.82 → 0.86

Forge Solutions Automatically

forge-pattern.ts
// When a bug is fixed, capture it
await ekkos.forge({
  title: 'Railway PM2 restart loop',
  problem:
    'PM2 workers restart endlessly ' +
    'when memory exceeds 512MB limit',
  solution:
    'Set max_memory_restart to 450MB ' +
    'with graceful shutdown handler',
  tags: ['railway', 'pm2', 'memory'],
  works_when: [
    'Node.js worker on Railway',
    'PM2 cluster mode'
  ],
  anti_patterns: [
    'Increasing memory limit only ' +
    'delays the crash'
  ]
});

// Next time any agent hits this issue,
// the solution surfaces automatically.
185
RPC Functions
71
Test Cases
100%
TypeScript
Section 6

Custom Algorithms

Four purpose-built algorithms for AI memory management. Each solves a specific failure mode discovered during production use.

Cache-Preserving Passthrough

Problem: Prompt cache misses cost 10-20x on every tool round-trip due to content injection at varying positions.

Solution: Detect tool round-trips, skip non-essential pipeline stages, use Redis-cached stable prefix for cache hits.

Convergence Evaluator (Delta-Prometheus)

Problem: No way to measure if the system was actually improving. Agents hallucinated on complex tasks with no consistency check.

Solution: Custom scoring formula that tracks pattern success rates, retrieval relevance, and error frequency over time to quantify improvement.

Active Forgetting Engine

Problem: Stale patterns accumulate forever. Bad solutions never get removed. Memory becomes noisy over time.

Solution: Bio-inspired: quarantine failing patterns (<30% success), merge duplicates (>92% similarity), decay unused patterns, promote winners to collective.

Semantic Rehydration

Problem: After context eviction, positional lookup (last 5 turns) misses relevant history that appeared earlier in conversation.

Solution: Replace positional lookup with vector similarity search across ALL evicted content. Always-on, not triggered by keywords.

Section 7

Failures & What I Learned

Junior developers show only success. Senior engineers show how they handled failure. Here are three production issues and the systems I built to fix them.

1

Agents Were Hallucinating on Complex Tasks

Task completion: 40%

Without memory, agents re-derived solutions from scratch every session. They'd get different (often wrong) answers each time. No feedback loop meant bad patterns persisted.

Fix: Built the Convergence Evaluator to track answer consistency. Combined with pattern extraction and outcome tracking, task completion rose from 40% to 86.7%.

2

Serverless Cold Starts Wiped In-Memory State

Pattern tracking: broken

The pattern application store used an in-memory Map<string, Data>. Every Vercel cold start cleared it. Timeout-based auto-outcomes were unreliable in serverless — functions terminate before timers fire.

Fix: Migrated to a Redis-backed store with 5-minute TTL and in-memory fallback. Replaced setTimeout with lazy auto-outcome processing — on each new Track call, stale verified applications >30s old get processed first.

3

Context Eviction Caused Amnesia

Agents forgot mid-conversation

When context windows filled up, the eviction system removed older messages. But the rehydration system only looked at the last 5 turns — missing critical context from earlier in the conversation. Agents would "forget" decisions made 20 minutes ago.

Fix: Replaced positional lookup with semantic vector search across ALL evicted content. Now always-on (not keyword-triggered). Used Google AI embeddings (free tier) to keep costs at zero. Evicted content is stored losslessly in Cloudflare R2.

Key Achievements

High-Volume Engineering

996+ commits as sole architect. 372 database tables covering memory, traces, evaluations, and analytics. 10+ GitHub Actions CI/CD pipelines for automated testing and deployment.

Performance Improvement

Increased AI task completion rates from 40% to 86.7% through automated pattern extraction and outcome tracking. Continuous improvement via the Golden Loop feedback mechanism.

Algorithm Design

Developed 4 custom algorithms: Cache-Preserving Passthrough, Delta-Prometheus Convergence Evaluator, Active Forgetting Engine, and Semantic Rehydration — each solving a specific production failure mode.

Data Processing

Created a business logic extraction pipeline processing 2,500+ patterns with 92.8% success rate. Automated learning pipelines analyze agent conversations to construct knowledge graphs.

Live Demo

Talk to the Memory System

This isn't a mock. Gemini 2.0 Flash with live MCP tool access to the real ekkOS production database. Ask it anything — watch the raw tool calls execute.

ekkOS Live Demo — Gemini 2.0 Flash + MCP Tools
LIVE
Ask the Memory System

This chatbox runs Gemini with live access to ekkOS MCP tools. Every tool call queries the real production database.

Powered by Gemini 2.0 Flash + ekkOS MCP

Tech Stack

Languages

TypeScript
Node.js
Python
SQL

AI / Agents

Claude
GPT
Gemini
MCP Protocol

Infrastructure

Vercel
Railway
GitHub Actions
Cloudflare R2

Data

PostgreSQL
Redis / Upstash
Neo4j
Vector Embeddings
Seann MacDougall

Engineered by Seann MacDougall

Lead Systems Architect

Seann is a Lead AI Engineer specializing in cognitive architectures and agentic memory systems. He designed and built ekkOS from the first commit to the 11th layer — 996+ commits, 372 database tables, and 4 custom algorithms, all as a solo architect. His focus is building production-grade infrastructure that makes AI agents genuinely smarter over time.

Whitby, Ontario · 289-927-0983