Memory Systems for AI Agents

An agent without memory is stateless — every interaction starts from zero. It cannot remember what the customer said two messages ago, what actions it already took, or what it learned from previous conversations. Memory transforms an agent from a single-turn assistant into a contextual, multi-turn problem solver.

This article covers the memory architecture used in Klivvr Agent, from working memory that tracks the current conversation to long-term memory that persists knowledge across sessions.

Three-Tier Memory Architecture

Klivvr Agent organizes memory into three tiers, each with different capacity, latency, and persistence characteristics.

Working memory is the current conversation context — the messages, tool results, and extracted entities from the ongoing interaction. It lives in the agent's state object and is discarded when the conversation ends.

Short-term memory persists across multiple interactions within a session. When a customer chats, hangs up, and calls back an hour later, short-term memory retains the context. It is stored in Redis with a TTL of 24 hours.

Long-term memory captures knowledge that should persist indefinitely — customer preferences, resolved issues, interaction patterns. It is stored in a vector database for semantic retrieval.

interface MemorySystem {
  working: WorkingMemory;
  shortTerm: ShortTermMemory;
  longTerm: LongTermMemory;
}
 
interface WorkingMemory {
  messages: Message[];
  entities: Map<string, unknown>;
  toolResults: ToolResult[];
  addMessage(message: Message): void;
  getRecentMessages(limit: number): Message[];
  getEntity(key: string): unknown | undefined;
  setEntity(key: string, value: unknown): void;
}
 
interface ShortTermMemory {
  save(sessionId: string, data: SessionData): Promise<void>;
  load(sessionId: string): Promise<SessionData | null>;
  extend(sessionId: string, ttlSeconds: number): Promise<void>;
}
 
interface LongTermMemory {
  store(entry: MemoryEntry): Promise<void>;
  search(query: string, limit: number): Promise<MemoryEntry[]>;
  getByCustomer(customerId: string): Promise<MemoryEntry[]>;
}

Working Memory: Conversation Context

Working memory manages the conversation context that the LLM sees on each turn. As conversations grow long, the context window fills up and older messages must be summarized or dropped.

class SlidingWindowMemory implements WorkingMemory {
  messages: Message[] = [];
  entities = new Map<string, unknown>();
  toolResults: ToolResult[] = [];
 
  private maxMessages: number;
  private maxTokens: number;
  private tokenCounter: TokenCounter;
 
  constructor(config: { maxMessages: number; maxTokens: number }) {
    this.maxMessages = config.maxMessages;
    this.maxTokens = config.maxTokens;
    this.tokenCounter = new TokenCounter();
  }
 
  addMessage(message: Message): void {
    this.messages.push(message);
    this.compact();
  }
 
  getRecentMessages(limit: number): Message[] {
    return this.messages.slice(-limit);
  }
 
  getEntity(key: string): unknown | undefined {
    return this.entities.get(key);
  }
 
  setEntity(key: string, value: unknown): void {
    this.entities.set(key, value);
  }
 
  private compact(): void {
    // Keep system message and recent messages within token budget
    const systemMessage = this.messages.find((m) => m.role === "system");
    const nonSystemMessages = this.messages.filter(
      (m) => m.role !== "system"
    );
 
    let totalTokens = systemMessage
      ? this.tokenCounter.count(systemMessage.content)
      : 0;
 
    const retained: Message[] = [];
 
    // Walk backwards from most recent, keeping messages within budget
    for (let i = nonSystemMessages.length - 1; i >= 0; i--) {
      const msgTokens = this.tokenCounter.count(
        nonSystemMessages[i].content
      );
      if (totalTokens + msgTokens > this.maxTokens) break;
      totalTokens += msgTokens;
      retained.unshift(nonSystemMessages[i]);
    }
 
    this.messages = systemMessage
      ? [systemMessage, ...retained]
      : retained;
  }
}

Summarization Strategy

When older messages are dropped from working memory, important context can be lost. A summarization strategy compresses older messages into a summary that preserves key information.

class SummarizingMemory implements WorkingMemory {
  messages: Message[] = [];
  entities = new Map<string, unknown>();
  toolResults: ToolResult[] = [];
  private summary: string = "";
  private summarizer: LLMClient;
  private recentWindowSize: number;
 
  constructor(config: {
    summarizer: LLMClient;
    recentWindowSize: number;
  }) {
    this.summarizer = config.summarizer;
    this.recentWindowSize = config.recentWindowSize;
  }
 
  addMessage(message: Message): void {
    this.messages.push(message);
  }
 
  async getContextMessages(): Promise<Message[]> {
    const recent = this.messages.slice(-this.recentWindowSize);
    const older = this.messages.slice(0, -this.recentWindowSize);
 
    if (older.length > 0) {
      this.summary = await this.summarize(older);
    }
 
    const contextMessages: Message[] = [];
 
    if (this.summary) {
      contextMessages.push({
        role: "system",
        content: `Previous conversation summary:\n${this.summary}`,
      });
    }
 
    contextMessages.push(...recent);
    return contextMessages;
  }
 
  private async summarize(messages: Message[]): Promise<string> {
    const conversation = messages
      .map((m) => `${m.role}: ${m.content}`)
      .join("\n");
 
    const response = await this.summarizer.chat({
      model: "claude-haiku-4-5-20251001",
      messages: [
        {
          role: "user",
          content:
            `Summarize this conversation, preserving: customer identity, ` +
            `issue details, actions taken, and any pending items.\n\n` +
            `${conversation}`,
        },
      ],
      tools: [],
      temperature: 0,
      maxTokens: 500,
    });
 
    return response.content;
  }
 
  getRecentMessages(limit: number): Message[] {
    return this.messages.slice(-limit);
  }
 
  getEntity(key: string): unknown | undefined {
    return this.entities.get(key);
  }
 
  setEntity(key: string, value: unknown): void {
    this.entities.set(key, value);
  }
}

Short-Term Memory: Session Persistence

Short-term memory bridges multiple interactions within a session. When a customer disconnects and reconnects, the agent should pick up where it left off.

import { Redis } from "ioredis";
 
interface SessionData {
  customerId?: string;
  conversationSummary: string;
  entities: Record<string, unknown>;
  actionsTaken: string[];
  lastActivityAt: string;
}
 
class RedisShortTermMemory implements ShortTermMemory {
  private redis: Redis;
  private defaultTTL: number;
 
  constructor(redis: Redis, defaultTTL: number = 86400) {
    this.redis = redis;
    this.defaultTTL = defaultTTL;
  }
 
  async save(sessionId: string, data: SessionData): Promise<void> {
    const key = `agent:session:${sessionId}`;
    await this.redis.setex(key, this.defaultTTL, JSON.stringify(data));
  }
 
  async load(sessionId: string): Promise<SessionData | null> {
    const key = `agent:session:${sessionId}`;
    const raw = await this.redis.get(key);
    if (!raw) return null;
    return JSON.parse(raw) as SessionData;
  }
 
  async extend(sessionId: string, ttlSeconds: number): Promise<void> {
    const key = `agent:session:${sessionId}`;
    await this.redis.expire(key, ttlSeconds);
  }
}

Long-Term Memory: Semantic Search

Long-term memory stores knowledge that persists across sessions — customer preferences, issue resolution history, and interaction patterns. Semantic search through vector embeddings enables retrieval of relevant memories based on meaning rather than exact keyword match.

interface MemoryEntry {
  id: string;
  customerId: string;
  content: string;
  type: "interaction" | "preference" | "resolution" | "note";
  embedding?: number[];
  createdAt: Date;
  metadata: Record<string, unknown>;
}
 
class VectorLongTermMemory implements LongTermMemory {
  private vectorStore: VectorStore;
  private embedder: EmbeddingClient;
 
  constructor(vectorStore: VectorStore, embedder: EmbeddingClient) {
    this.vectorStore = vectorStore;
    this.embedder = embedder;
  }
 
  async store(entry: MemoryEntry): Promise<void> {
    const embedding = await this.embedder.embed(entry.content);
    await this.vectorStore.upsert({
      id: entry.id,
      vector: embedding,
      metadata: {
        customerId: entry.customerId,
        type: entry.type,
        content: entry.content,
        createdAt: entry.createdAt.toISOString(),
        ...entry.metadata,
      },
    });
  }
 
  async search(query: string, limit: number = 5): Promise<MemoryEntry[]> {
    const queryEmbedding = await this.embedder.embed(query);
    const results = await this.vectorStore.query({
      vector: queryEmbedding,
      topK: limit,
    });
 
    return results.map((r) => ({
      id: r.id,
      customerId: r.metadata.customerId as string,
      content: r.metadata.content as string,
      type: r.metadata.type as MemoryEntry["type"],
      createdAt: new Date(r.metadata.createdAt as string),
      metadata: r.metadata,
    }));
  }
 
  async getByCustomer(customerId: string): Promise<MemoryEntry[]> {
    const results = await this.vectorStore.query({
      vector: await this.embedder.embed(""),
      topK: 20,
      filter: { customerId },
    });
 
    return results.map((r) => ({
      id: r.id,
      customerId: r.metadata.customerId as string,
      content: r.metadata.content as string,
      type: r.metadata.type as MemoryEntry["type"],
      createdAt: new Date(r.metadata.createdAt as string),
      metadata: r.metadata,
    }));
  }
}

Assembling the Memory Pipeline

The three memory tiers work together to give the agent rich context on every turn.

class MemoryAwareAgent {
  private memory: MemorySystem;
 
  async buildContext(
    sessionId: string,
    customerId?: string
  ): Promise<Message[]> {
    const context: Message[] = [];
 
    // Load session context from short-term memory
    const session = await this.memory.shortTerm.load(sessionId);
    if (session) {
      context.push({
        role: "system",
        content: `Previous session context:\n${session.conversationSummary}\n` +
          `Actions taken: ${session.actionsTaken.join(", ")}`,
      });
    }
 
    // Load relevant long-term memories
    if (customerId) {
      const memories = await this.memory.longTerm.getByCustomer(customerId);
      if (memories.length > 0) {
        const memoryText = memories
          .map((m) => `[${m.type}] ${m.content}`)
          .join("\n");
        context.push({
          role: "system",
          content: `Customer history:\n${memoryText}`,
        });
      }
    }
 
    // Add working memory messages
    context.push(...this.memory.working.getRecentMessages(20));
 
    return context;
  }
}

Conclusion

Memory transforms an AI agent from a stateless responder into a contextual partner that remembers, learns, and adapts. Working memory manages the current conversation. Short-term memory bridges interactions within a session. Long-term memory accumulates knowledge over time. In Klivvr Agent, these three tiers work together to give the agent the context it needs to provide personalized, efficient support — reducing the need for customers to repeat themselves and enabling agents to build on previous interactions.

Memory Systems for AI Agents

Three-Tier Memory Architecture

Working Memory: Conversation Context

Summarization Strategy

Short-Term Memory: Session Persistence

Long-Term Memory: Semantic Search

Assembling the Memory Pipeline

Conclusion

Related Articles

AI Agents in Fintech Operations

Human-in-the-Loop Patterns for AI Agents

Multi-Agent Systems in TypeScript

Related Articles

AI Agents in Fintech Operations
October 20, 20257 min read

Human-in-the-Loop Patterns for AI Agents
September 5, 20257 min read

Multi-Agent Systems in TypeScript
August 14, 20256 min read