A technical walkthrough of ekkOS. I built this 12-layer cognitive substrate to solve context amnesia, giving AI agents persistent memory, continuous learning, and cross-session intelligence.
Most AI memory solutions rely on a single vector database. I wanted to build something closer to how actual cognition works. ekkOS uses 12 distinct layers, each serving a highly specific, structural purpose in the learning lifecycle.
Obsidian-style agent knowledge mapping
Each layer is addressable via the MCP protocol. Agents can search, write, and reason across all 12 layers in a single tool call. The architecture supports both personal (per-user) and collective (cross-user) memory scopes.
This is a full infrastructure layer, not just an API wrapper. It uses 372 database tables to handle memory, execution traces, embeddings, and real-time evaluations. Here is the actual data model that powers the swarm.
// Pattern: a proven solution extracted
// from agent conversations
interface Pattern {
id: string;
title: string;
problem: string;
solution: string;
confidence: number; // 0.0 - 1.0
success_rate: number; // tracked via outcomes
applied_count: number;
tags: string[];
embedding: number[]; // 1536-dim vector
works_when: string[];
anti_patterns: string[];
created_at: Date;
last_applied: Date;
}
// Active Forgetting: patterns decay
// if unused, get quarantined if failing
interface DecayConfig {
decay_rate: number; // 0.95 per period
min_confidence: number; // floor at 0.1
quarantine_threshold: 0.3;
promotion_threshold: 0.8;
}// Semantic Rehydration: vector search
// replaces positional context lookup
interface ContextFrame {
session_id: string;
active_patterns: Pattern[];
rehydrated_turns: Turn[];
decay_rate: number;
eviction_threshold: number;
}
// Search across evicted context using
// embedding similarity, not position
async function rehydrate(
query: string,
session: string
): Promise<ContextFrame> {
const embedding = await embed(query);
return searchEvictedContent(
embedding, session
);
}API costs can scale rapidly in multi-agent systems. I designed a custom cache-preserving router that detects tool round-trips and surgically skips pipeline stages, reducing token usage by 77% while maintaining complete context.
Claude Code makes 10-20+ API calls per user prompt (2-3 per tool round-trip). Standard proxy pipelines inject context at varying positions, breaking Anthropic's prompt cache on every call. Result: 10-20x cost multiplication from cache misses.
Designed a "Cache-Preserving Passthrough" algorithm that detects tool round-trips and skips non-essential pipeline stages. Memory injections use Redis-cached content (stable prefix = cache hit). Emergency eviction only fires at 95% vs. the normal 75% threshold.
Key insight: Anthropic's prompt cache gives a 90% discount on cache hits. The cache works on the exact prefix of the messages array. Any content change at any position breaks the cache.
Anthropic's Model Context Protocol (MCP) is the standard for connecting AI agents to external tools. ekkOS exposes all 11 memory layers as MCP tools.
Teach Claude something in VS Code, and Cursor already knows it. All agents share the same memory via MCP, regardless of which IDE or tool you use.
{
"mcpServers": {
"ekkos-memory": {
"type": "sse",
"url": "https://mcp.ekkos.dev/sse",
"env": {
"EKKOS_USER_ID": "your-user-id"
}
}
}
}
// One config. Every IDE connected.
// Memory persists across sessions,
// tools, and even team members.// Agent automatically searches memory
// before answering any question
const results = await ekkos.search({
query: "supabase auth setup",
sources: ["patterns", "episodic"]
});
// Found 3 patterns from past sessions
// Confidence: 0.94, 0.87, 0.72
// Agent applies the highest-confidence
// solution without re-deriving it.Developer experience is critical. I wrote 185 API functions with strict TypeScript types, backed by automated tests, to provide a clean, deterministic SDK for other engineers to build on.
import { EkkosClient } from '@ekkos/sdk';
const ekkos = new EkkosClient({
transport: 'sse',
endpoint: 'https://mcp.ekkos.dev'
});
// Search memory before answering
const context = await ekkos.search({
query: 'deployment error on Railway',
sources: ['patterns', 'episodic'],
limit: 5
});
// Apply a pattern and track outcome
const app = await ekkos.track({
pattern_id: context.patterns[0].id,
context: { task: 'fix deployment' }
});
// Record success for reinforcement
await ekkos.outcome({
application_id: app.id,
success: true
});
// Pattern confidence: 0.82 → 0.86// When a bug is fixed, capture it
await ekkos.forge({
title: 'Railway PM2 restart loop',
problem:
'PM2 workers restart endlessly ' +
'when memory exceeds 512MB limit',
solution:
'Set max_memory_restart to 450MB ' +
'with graceful shutdown handler',
tags: ['railway', 'pm2', 'memory'],
works_when: [
'Node.js worker on Railway',
'PM2 cluster mode'
],
anti_patterns: [
'Increasing memory limit only ' +
'delays the crash'
]
});
// Next time any agent hits this issue,
// the solution surfaces automatically.Off-the-shelf tools weren't enough. I had to design four custom algorithms for AI memory management, each solving a specific failure mode I ran into during production.
Problem Def
Prompt cache misses cost 10-20x on every tool round-trip due to content injection at varying positions.
Resolution
Detect tool round-trips, skip non-essential pipeline stages, use Redis-cached stable prefix for cache hits.
Problem Def
No way to measure if the system was actually improving. Agents hallucinated on complex tasks with no consistency check.
Resolution
Custom scoring formula that tracks pattern success rates, retrieval relevance, and error frequency over time to quantify improvement.
Problem Def
Stale patterns accumulate forever. Bad solutions never get removed. Memory becomes noisy over time.
Resolution
Bio-inspired: quarantine failing patterns (<30% success), merge duplicates (>92% similarity), decay unused patterns, promote winners to collective.
Problem Def
After context eviction, positional lookup (last 5 turns) misses relevant history that appeared earlier in conversation.
Resolution
Replace positional lookup with vector similarity search across ALL evicted content. Always-on, not triggered by keywords.
Building this wasn't easy. Here are a few major production failures I ran into, and the systems I had to architect to fix them.
Task completion: 40%
Without memory, agents re-derived solutions from scratch every session. They'd get different (often wrong) answers each time. No feedback loop meant bad patterns persisted.
Resolution Path Built the Convergence Evaluator to track answer consistency. Combined with pattern extraction and outcome tracking, task completion rose from 40% to 86.7%.
Pattern tracking: broken
The pattern application store used an in-memory Map<string, Data>. Every Vercel cold start cleared it. Timeout-based auto-outcomes were unreliable in serverless environments because functions terminate before timers fire.
Resolution Path Migrated to a Redis-backed store with 5-minute TTL and in-memory fallback. Replaced setTimeout with lazy auto-outcome processing: on each new Track call, stale verified applications >30s old get processed first.
Agents forgot mid-conversation
When context windows filled up, the eviction system removed older messages. But the rehydration system only looked at the last 5 turns, missing critical context from earlier in the conversation. Agents would "forget" decisions made 20 minutes ago.
Resolution Path Replaced positional lookup with semantic vector search across ALL evicted content. Now always-on (not keyword-triggered). Used Google AI embeddings (free tier) to keep costs at zero. Evicted content is stored losslessly in Cloudflare R2.
1,500+ commits as sole architect. 372 database tables covering memory, traces, evaluations, and analytics. 10+ GitHub Actions CI/CD pipelines for automated testing and deployment.
Increased AI task completion rates from 40% to 86.7% through automated pattern extraction and outcome tracking. Continuous improvement via the Golden Loop feedback mechanism.
Developed 4 custom algorithms: Cache-Preserving Passthrough, Delta-Prometheus Convergence Evaluator, Active Forgetting Engine, and Semantic Rehydration. each solving a specific production failure mode.
Created a business logic extraction pipeline processing 2,500+ patterns with 92.8% success rate. Automated learning pipelines analyze agent conversations to construct knowledge graphs.

Lead Systems Architect
I am a Lead AI & Systems Engineer focused on cognitive architecture and agentic memory. I designed and built ekkOS entirely from scratch—spanning 1,500+ commits, 372 database tables, and a local swarm daemon—as a solo architect. My focus is on building scalable, production-grade AI infrastructure.
Whitby, Ontario · 289-927-0983