Memory Systems for AI Agents
How to implement short-term and long-term memory for AI agents, covering conversation context management, vector stores for semantic retrieval, and session persistence patterns in Klivvr Agent.
An agent without memory is stateless — every interaction starts from zero. It cannot remember what the customer said two messages ago, what actions it already took, or what it learned from previous conversations. Memory transforms an agent from a single-turn assistant into a contextual, multi-turn problem solver.
This article covers the memory architecture used in Klivvr Agent, from working memory that tracks the current conversation to long-term memory that persists knowledge across sessions.
Three-Tier Memory Architecture
Klivvr Agent organizes memory into three tiers, each with different capacity, latency, and persistence characteristics.
Working memory is the current conversation context — the messages, tool results, and extracted entities from the ongoing interaction. It lives in the agent's state object and is discarded when the conversation ends.
Short-term memory persists across multiple interactions within a session. When a customer chats, hangs up, and calls back an hour later, short-term memory retains the context. It is stored in Redis with a TTL of 24 hours.
Long-term memory captures knowledge that should persist indefinitely — customer preferences, resolved issues, interaction patterns. It is stored in a vector database for semantic retrieval.
interface MemorySystem {
working: WorkingMemory;
shortTerm: ShortTermMemory;
longTerm: LongTermMemory;
}
interface WorkingMemory {
messages: Message[];
entities: Map<string, unknown>;
toolResults: ToolResult[];
addMessage(message: Message): void;
getRecentMessages(limit: number): Message[];
getEntity(key: string): unknown | undefined;
setEntity(key: string, value: unknown): void;
}
interface ShortTermMemory {
save(sessionId: string, data: SessionData): Promise<void>;
load(sessionId: string): Promise<SessionData | null>;
extend(sessionId: string, ttlSeconds: number): Promise<void>;
}
interface LongTermMemory {
store(entry: MemoryEntry): Promise<void>;
search(query: string, limit: number): Promise<MemoryEntry[]>;
getByCustomer(customerId: string): Promise<MemoryEntry[]>;
}Working Memory: Conversation Context
Working memory manages the conversation context that the LLM sees on each turn. As conversations grow long, the context window fills up and older messages must be summarized or dropped.
class SlidingWindowMemory implements WorkingMemory {
messages: Message[] = [];
entities = new Map<string, unknown>();
toolResults: ToolResult[] = [];
private maxMessages: number;
private maxTokens: number;
private tokenCounter: TokenCounter;
constructor(config: { maxMessages: number; maxTokens: number }) {
this.maxMessages = config.maxMessages;
this.maxTokens = config.maxTokens;
this.tokenCounter = new TokenCounter();
}
addMessage(message: Message): void {
this.messages.push(message);
this.compact();
}
getRecentMessages(limit: number): Message[] {
return this.messages.slice(-limit);
}
getEntity(key: string): unknown | undefined {
return this.entities.get(key);
}
setEntity(key: string, value: unknown): void {
this.entities.set(key, value);
}
private compact(): void {
// Keep system message and recent messages within token budget
const systemMessage = this.messages.find((m) => m.role === "system");
const nonSystemMessages = this.messages.filter(
(m) => m.role !== "system"
);
let totalTokens = systemMessage
? this.tokenCounter.count(systemMessage.content)
: 0;
const retained: Message[] = [];
// Walk backwards from most recent, keeping messages within budget
for (let i = nonSystemMessages.length - 1; i >= 0; i--) {
const msgTokens = this.tokenCounter.count(
nonSystemMessages[i].content
);
if (totalTokens + msgTokens > this.maxTokens) break;
totalTokens += msgTokens;
retained.unshift(nonSystemMessages[i]);
}
this.messages = systemMessage
? [systemMessage, ...retained]
: retained;
}
}Summarization Strategy
When older messages are dropped from working memory, important context can be lost. A summarization strategy compresses older messages into a summary that preserves key information.
class SummarizingMemory implements WorkingMemory {
messages: Message[] = [];
entities = new Map<string, unknown>();
toolResults: ToolResult[] = [];
private summary: string = "";
private summarizer: LLMClient;
private recentWindowSize: number;
constructor(config: {
summarizer: LLMClient;
recentWindowSize: number;
}) {
this.summarizer = config.summarizer;
this.recentWindowSize = config.recentWindowSize;
}
addMessage(message: Message): void {
this.messages.push(message);
}
async getContextMessages(): Promise<Message[]> {
const recent = this.messages.slice(-this.recentWindowSize);
const older = this.messages.slice(0, -this.recentWindowSize);
if (older.length > 0) {
this.summary = await this.summarize(older);
}
const contextMessages: Message[] = [];
if (this.summary) {
contextMessages.push({
role: "system",
content: `Previous conversation summary:\n${this.summary}`,
});
}
contextMessages.push(...recent);
return contextMessages;
}
private async summarize(messages: Message[]): Promise<string> {
const conversation = messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n");
const response = await this.summarizer.chat({
model: "claude-haiku-4-5-20251001",
messages: [
{
role: "user",
content:
`Summarize this conversation, preserving: customer identity, ` +
`issue details, actions taken, and any pending items.\n\n` +
`${conversation}`,
},
],
tools: [],
temperature: 0,
maxTokens: 500,
});
return response.content;
}
getRecentMessages(limit: number): Message[] {
return this.messages.slice(-limit);
}
getEntity(key: string): unknown | undefined {
return this.entities.get(key);
}
setEntity(key: string, value: unknown): void {
this.entities.set(key, value);
}
}Short-Term Memory: Session Persistence
Short-term memory bridges multiple interactions within a session. When a customer disconnects and reconnects, the agent should pick up where it left off.
import { Redis } from "ioredis";
interface SessionData {
customerId?: string;
conversationSummary: string;
entities: Record<string, unknown>;
actionsTaken: string[];
lastActivityAt: string;
}
class RedisShortTermMemory implements ShortTermMemory {
private redis: Redis;
private defaultTTL: number;
constructor(redis: Redis, defaultTTL: number = 86400) {
this.redis = redis;
this.defaultTTL = defaultTTL;
}
async save(sessionId: string, data: SessionData): Promise<void> {
const key = `agent:session:${sessionId}`;
await this.redis.setex(key, this.defaultTTL, JSON.stringify(data));
}
async load(sessionId: string): Promise<SessionData | null> {
const key = `agent:session:${sessionId}`;
const raw = await this.redis.get(key);
if (!raw) return null;
return JSON.parse(raw) as SessionData;
}
async extend(sessionId: string, ttlSeconds: number): Promise<void> {
const key = `agent:session:${sessionId}`;
await this.redis.expire(key, ttlSeconds);
}
}Long-Term Memory: Semantic Search
Long-term memory stores knowledge that persists across sessions — customer preferences, issue resolution history, and interaction patterns. Semantic search through vector embeddings enables retrieval of relevant memories based on meaning rather than exact keyword match.
interface MemoryEntry {
id: string;
customerId: string;
content: string;
type: "interaction" | "preference" | "resolution" | "note";
embedding?: number[];
createdAt: Date;
metadata: Record<string, unknown>;
}
class VectorLongTermMemory implements LongTermMemory {
private vectorStore: VectorStore;
private embedder: EmbeddingClient;
constructor(vectorStore: VectorStore, embedder: EmbeddingClient) {
this.vectorStore = vectorStore;
this.embedder = embedder;
}
async store(entry: MemoryEntry): Promise<void> {
const embedding = await this.embedder.embed(entry.content);
await this.vectorStore.upsert({
id: entry.id,
vector: embedding,
metadata: {
customerId: entry.customerId,
type: entry.type,
content: entry.content,
createdAt: entry.createdAt.toISOString(),
...entry.metadata,
},
});
}
async search(query: string, limit: number = 5): Promise<MemoryEntry[]> {
const queryEmbedding = await this.embedder.embed(query);
const results = await this.vectorStore.query({
vector: queryEmbedding,
topK: limit,
});
return results.map((r) => ({
id: r.id,
customerId: r.metadata.customerId as string,
content: r.metadata.content as string,
type: r.metadata.type as MemoryEntry["type"],
createdAt: new Date(r.metadata.createdAt as string),
metadata: r.metadata,
}));
}
async getByCustomer(customerId: string): Promise<MemoryEntry[]> {
const results = await this.vectorStore.query({
vector: await this.embedder.embed(""),
topK: 20,
filter: { customerId },
});
return results.map((r) => ({
id: r.id,
customerId: r.metadata.customerId as string,
content: r.metadata.content as string,
type: r.metadata.type as MemoryEntry["type"],
createdAt: new Date(r.metadata.createdAt as string),
metadata: r.metadata,
}));
}
}Assembling the Memory Pipeline
The three memory tiers work together to give the agent rich context on every turn.
class MemoryAwareAgent {
private memory: MemorySystem;
async buildContext(
sessionId: string,
customerId?: string
): Promise<Message[]> {
const context: Message[] = [];
// Load session context from short-term memory
const session = await this.memory.shortTerm.load(sessionId);
if (session) {
context.push({
role: "system",
content: `Previous session context:\n${session.conversationSummary}\n` +
`Actions taken: ${session.actionsTaken.join(", ")}`,
});
}
// Load relevant long-term memories
if (customerId) {
const memories = await this.memory.longTerm.getByCustomer(customerId);
if (memories.length > 0) {
const memoryText = memories
.map((m) => `[${m.type}] ${m.content}`)
.join("\n");
context.push({
role: "system",
content: `Customer history:\n${memoryText}`,
});
}
}
// Add working memory messages
context.push(...this.memory.working.getRecentMessages(20));
return context;
}
}Conclusion
Memory transforms an AI agent from a stateless responder into a contextual partner that remembers, learns, and adapts. Working memory manages the current conversation. Short-term memory bridges interactions within a session. Long-term memory accumulates knowledge over time. In Klivvr Agent, these three tiers work together to give the agent the context it needs to provide personalized, efficient support — reducing the need for customers to repeat themselves and enabling agents to build on previous interactions.
Related Articles
AI Agents in Fintech Operations
How AI agents automate fintech operational workflows including compliance monitoring, fraud detection, dispute resolution, and regulatory reporting — with insights from Klivvr Agent deployments.
Human-in-the-Loop Patterns for AI Agents
How to design effective human-in-the-loop workflows for AI agents, covering escalation policies, approval workflows, the autonomy ladder, and trust-building strategies.
Multi-Agent Systems in TypeScript
Architecture patterns for multi-agent systems including supervisor topologies, agent-to-agent communication, task delegation, and shared state management in Klivvr Agent.