A technical walkthrough of ekkOS — an 11-layer cognitive architecture that gives AI agents persistent memory, continuous learning, and cross-session intelligence.
Most AI memory solutions are a single key-value store. ekkOS implements a layered architecture inspired by human cognitive science — each layer serves a distinct purpose in the memory lifecycle.
Each layer is an independent, addressable memory system. Agents search, write, and reason across all 11 layers via a single MCP tool call.
Hover over a layer to explore its purpose
Each layer is addressable via the MCP protocol. Agents can search, write, and reason across all 11 layers in a single tool call. The architecture supports both personal (per-user) and collective (cross-user) memory scopes.
372 database tables handle memory, traces, embeddings, and evaluations. Here's how the core data model works.
// Pattern — a proven solution extracted
// from agent conversations
interface Pattern {
id: string;
title: string;
problem: string;
solution: string;
confidence: number; // 0.0 - 1.0
success_rate: number; // tracked via outcomes
applied_count: number;
tags: string[];
embedding: number[]; // 1536-dim vector
works_when: string[];
anti_patterns: string[];
created_at: Date;
last_applied: Date;
}
// Active Forgetting — patterns decay
// if unused, get quarantined if failing
interface DecayConfig {
decay_rate: number; // 0.95 per period
min_confidence: number; // floor at 0.1
quarantine_threshold: 0.3;
promotion_threshold: 0.8;
}// Semantic Rehydration — vector search
// replaces positional context lookup
interface ContextFrame {
session_id: string;
active_patterns: Pattern[];
rehydrated_turns: Turn[];
decay_rate: number;
eviction_threshold: number;
}
// Search across evicted context using
// embedding similarity, not position
async function rehydrate(
query: string,
session: string
): Promise<ContextFrame> {
const embedding = await embed(query);
return searchEvictedContent(
embedding, session
);
}LLM API costs scale fast. ekkOS implements intelligent routing and cache optimization to cut token usage by 77% while maintaining quality.
Claude Code makes 10-20+ API calls per user prompt (2-3 per tool round-trip). Standard proxy pipelines inject context at varying positions, breaking Anthropic's prompt cache on every call. Result: 10-20x cost multiplication from cache misses.
Designed a "Cache-Preserving Passthrough" algorithm that detects tool round-trips and skips non-essential pipeline stages. Memory injections use Redis-cached content (stable prefix = cache hit). Emergency eviction only fires at 95% vs. the normal 75% threshold.
Key insight: Anthropic's prompt cache gives a 90% discount on cache hits. The cache works on the exact prefix of the messages array. Any content change at any position breaks the cache.
Anthropic's Model Context Protocol (MCP) is the standard for connecting AI agents to external tools. ekkOS exposes all 11 memory layers as MCP tools.
Teach Claude something in VS Code — Cursor already knows it. All agents share the same memory via MCP, regardless of which IDE or tool you use.
{
"mcpServers": {
"ekkos-memory": {
"type": "sse",
"url": "https://mcp.ekkos.dev/sse",
"env": {
"EKKOS_USER_ID": "your-user-id"
}
}
}
}
// One config. Every IDE connected.
// Memory persists across sessions,
// tools, and even team members.// Agent automatically searches memory
// before answering any question
const results = await ekkos.search({
query: "supabase auth setup",
sources: ["patterns", "episodic"]
});
// Found 3 patterns from past sessions
// Confidence: 0.94, 0.87, 0.72
// Agent applies the highest-confidence
// solution without re-deriving it.185 API functions with TypeScript types. 71 automated tests. Clean interfaces designed for other developers to build on.
import { EkkosClient } from '@ekkos/sdk';
const ekkos = new EkkosClient({
transport: 'sse',
endpoint: 'https://mcp.ekkos.dev'
});
// Search memory before answering
const context = await ekkos.search({
query: 'deployment error on Railway',
sources: ['patterns', 'episodic'],
limit: 5
});
// Apply a pattern and track outcome
const app = await ekkos.track({
pattern_id: context.patterns[0].id,
context: { task: 'fix deployment' }
});
// Record success for reinforcement
await ekkos.outcome({
application_id: app.id,
success: true
});
// Pattern confidence: 0.82 → 0.86// When a bug is fixed, capture it
await ekkos.forge({
title: 'Railway PM2 restart loop',
problem:
'PM2 workers restart endlessly ' +
'when memory exceeds 512MB limit',
solution:
'Set max_memory_restart to 450MB ' +
'with graceful shutdown handler',
tags: ['railway', 'pm2', 'memory'],
works_when: [
'Node.js worker on Railway',
'PM2 cluster mode'
],
anti_patterns: [
'Increasing memory limit only ' +
'delays the crash'
]
});
// Next time any agent hits this issue,
// the solution surfaces automatically.Four purpose-built algorithms for AI memory management. Each solves a specific failure mode discovered during production use.
Problem: Prompt cache misses cost 10-20x on every tool round-trip due to content injection at varying positions.
Solution: Detect tool round-trips, skip non-essential pipeline stages, use Redis-cached stable prefix for cache hits.
Problem: No way to measure if the system was actually improving. Agents hallucinated on complex tasks with no consistency check.
Solution: Custom scoring formula that tracks pattern success rates, retrieval relevance, and error frequency over time to quantify improvement.
Problem: Stale patterns accumulate forever. Bad solutions never get removed. Memory becomes noisy over time.
Solution: Bio-inspired: quarantine failing patterns (<30% success), merge duplicates (>92% similarity), decay unused patterns, promote winners to collective.
Problem: After context eviction, positional lookup (last 5 turns) misses relevant history that appeared earlier in conversation.
Solution: Replace positional lookup with vector similarity search across ALL evicted content. Always-on, not triggered by keywords.
Junior developers show only success. Senior engineers show how they handled failure. Here are three production issues and the systems I built to fix them.
Task completion: 40%
Without memory, agents re-derived solutions from scratch every session. They'd get different (often wrong) answers each time. No feedback loop meant bad patterns persisted.
Fix: Built the Convergence Evaluator to track answer consistency. Combined with pattern extraction and outcome tracking, task completion rose from 40% to 86.7%.
Pattern tracking: broken
The pattern application store used an in-memory Map<string, Data>. Every Vercel cold start cleared it. Timeout-based auto-outcomes were unreliable in serverless — functions terminate before timers fire.
Fix: Migrated to a Redis-backed store with 5-minute TTL and in-memory fallback. Replaced setTimeout with lazy auto-outcome processing — on each new Track call, stale verified applications >30s old get processed first.
Agents forgot mid-conversation
When context windows filled up, the eviction system removed older messages. But the rehydration system only looked at the last 5 turns — missing critical context from earlier in the conversation. Agents would "forget" decisions made 20 minutes ago.
Fix: Replaced positional lookup with semantic vector search across ALL evicted content. Now always-on (not keyword-triggered). Used Google AI embeddings (free tier) to keep costs at zero. Evicted content is stored losslessly in Cloudflare R2.
996+ commits as sole architect. 372 database tables covering memory, traces, evaluations, and analytics. 10+ GitHub Actions CI/CD pipelines for automated testing and deployment.
Increased AI task completion rates from 40% to 86.7% through automated pattern extraction and outcome tracking. Continuous improvement via the Golden Loop feedback mechanism.
Developed 4 custom algorithms: Cache-Preserving Passthrough, Delta-Prometheus Convergence Evaluator, Active Forgetting Engine, and Semantic Rehydration — each solving a specific production failure mode.
Created a business logic extraction pipeline processing 2,500+ patterns with 92.8% success rate. Automated learning pipelines analyze agent conversations to construct knowledge graphs.
This isn't a mock. Gemini 2.0 Flash with live MCP tool access to the real ekkOS production database. Ask it anything — watch the raw tool calls execute.

Lead Systems Architect
Seann is a Lead AI Engineer specializing in cognitive architectures and agentic memory systems. He designed and built ekkOS from the first commit to the 11th layer — 996+ commits, 372 database tables, and 4 custom algorithms, all as a solo architect. His focus is building production-grade infrastructure that makes AI agents genuinely smarter over time.
Whitby, Ontario · 289-927-0983