Memory and Embedding Design
Overview
This document outlines the design decisions for memory management, embedding models, and storage architecture for the light-genai-4j agent system.
For agent skills and tool execution, see Agent Skill Design.
1. Embedding Model Selection
1.1 Chosen Model: BGE-small-en-v1.5 (Quantized)
Artifact: bge-small-en-v15-q (migrated from langchain4j)
Specifications:
- Dimensions: 384
- Pooling Mode: CLS (uses [CLS] token representation)
- Model Size: ~17MB (quantized version)
- Language: English (primary), supports basic multilingual capability
- Source: BAAI (Beijing Academy of Artificial Intelligence)
1.2 Why BGE-small-en-v1.5?
| Use Case | Requirement | How BGE Fits |
|---|---|---|
| Short-term Memory | Fast retrieval of recent context | Optimized for semantic similarity search |
| Long-term Memory | Finding relevant past conversations | State-of-the-art for asymmetric retrieval (query → document) |
| Knowledge Base | RAG (Retrieval-Augmented Generation) | Best-in-class for information retrieval |
1.3 Comparison with Alternatives
| Model | Dimensions | Best For | Why Not Chosen |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | General similarity | Not optimized for retrieval tasks |
| E5-small-v2 | 384 | Asymmetric search | Query/passage prefixes more complex |
| BGE-small-en | 384 | English retrieval | v1.5 has better performance |
| BGE-small-zh-v1.5 | 512 | Chinese | Different dimensions (see Section 4) |
1.4 Usage Pattern
// For queries (retrieving memories/knowledge)
String queryPrefix = "Represent this sentence for searching relevant passages:";
String query = queryPrefix + " " + userMessage;
// For storing (memories, knowledge documents)
String document = conversationText; // No prefix needed
2. Vector Database: PostgreSQL with pgvector
2.1 Schema Design
The schema adopts a scope-based memory model (similar to mem0), organizing memory by lifetime and visibility rather than abstract types.
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Session Memory (Short-term)
-- Stores in-flight conversation context. Auto-expires via TTL.
CREATE TABLE session_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL,
agent_id UUID NOT NULL,
user_id UUID, -- NULL allowed for anonymous sessions or background system tasks
content TEXT NOT NULL,
embedding VECTOR(384),
importance_score FLOAT DEFAULT 1.0,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP DEFAULT NOW() + INTERVAL '1 hour', -- TTL
metadata JSONB -- Rich filtering (e.g., {"topic": "debug", "turn": 5})
);
-- User Memory (Long-term)
-- Stores persistent facts/preferences about a user. Manual or inferred.
CREATE TABLE user_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL,
user_id UUID NOT NULL, -- Must be tied to a specific user
content TEXT NOT NULL, -- e.g., "User prefers Java over Python"
embedding VECTOR(384),
memory_type VARCHAR(50), -- 'fact', 'preference', 'summary'
importance_score FLOAT DEFAULT 1.0,
access_count INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
last_accessed TIMESTAMP DEFAULT NOW(),
metadata JSONB -- e.g., {"confidence": 0.9, "source": "conversation_123"}
);
-- Agent Memory (Private/Operational)
-- Stores agent-specific learning, state, or persistent persona data.
-- Scope: Private to the agent, typically across multiple users or sessions.
CREATE TABLE agent_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL,
-- NO user_id: This is agent-centric knowledge
content TEXT NOT NULL,
embedding VECTOR(384),
memory_type VARCHAR(50), -- 'learning', 'state', 'persona', 'scratchpad'
created_at TIMESTAMP DEFAULT NOW(),
metadata JSONB
);
-- Organizational Memory (Knowledge Base)
-- Stores global, shared knowledge available to all agents/users.
CREATE TABLE organizational_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL,
source VARCHAR(255), -- Document source/name
content TEXT NOT NULL,
embedding VECTOR(384),
chunk_index INTEGER, -- For large documents split into chunks
document_id UUID, -- Reference to parent document
created_at TIMESTAMP DEFAULT NOW(),
metadata JSONB -- e.g., {"department": "HR", "version": "1.0"}
);
-- Indexes for similarity search and metadata filtering
CREATE INDEX idx_session_memory_embedding ON session_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX idx_session_metadata ON session_memories USING GIN (metadata);
CREATE INDEX idx_user_memory_embedding ON user_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX idx_user_metadata ON user_memories USING GIN (metadata);
CREATE INDEX idx_agent_memory_embedding ON agent_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX idx_agent_metadata ON agent_memories USING GIN (metadata);
CREATE INDEX idx_org_memory_embedding ON organizational_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX idx_org_metadata ON organizational_memories USING GIN (metadata);
2.2 Memory Lifecycle
The system follows a promotion strategy:
User Input
│
▼
┌─────────────────────────────────────┐
│ Query Embedding (with prefix) │
└─────────────────────────────────────┘
│
├──► Search Session Memory (Context)
│ └── Recent turns, immediate task context
│
├──► Search User Memory (Personalization)
│ └── User preferences, past decisions, facts
│
├──► Search Agent Memory (Self-Knowledge)
│ └── Agent persona, learned behaviors, operational state
│
└──► Search Organizational Memory (Knowledge)
└── Policies, docs, FAQs
│
▼
Consolidated Context → LLM
│
▼
Store Response ──────► Session Memory
│ │
│ ▼
│ (Optional: Extraction/Inference)
│ Run LLM to extract facts from conversation
│ │
│ ┌──────────────┴──────────────┐
│ ▼ ▼
│ User Memory Agent Memory
│ (User Facts) (Agent Learnings)
│
▼
(Session ends)
Session Memory Expires
2.3 Memory Retrieval Strategy
public class MemoryService {
public List<Memory> retrieveRelevantMemories(String query, UUID sessionId, UUID userId, UUID agentId) {
// 1. Embed the query with prefix
String prefixedQuery = "Represent this sentence for searching relevant passages: " + query;
Embedding queryEmbedding = embeddingModel.embed(prefixedQuery);
// 2. Search Session Memory (Short-term context)
List<Memory> sessionContext = searchSession(queryEmbedding, sessionId, limit = 10);
// 3. Search User Memory (Personalization)
List<Memory> userFacts = searchUser(queryEmbedding, userId, limit = 5);
// 4. Search Agent Memory (Agent Self-Knowledge)
List<Memory> agentFacts = searchAgent(queryEmbedding, agentId, limit = 3);
// 5. Search Organizational Memory (Knowledge Base)
List<Memory> orgKnowledge = searchOrg(queryEmbedding, agentId, limit = 5);
// 6. Combine and Rerank
return combineAndRank(sessionContext, userFacts, agentFacts, orgKnowledge);
}
private List<Memory> searchSession(Embedding query, UUID sessionId, int limit) {
String sql = """
SELECT id, content, embedding <=> ? AS distance
FROM session_memories
WHERE session_id = ? AND expires_at > NOW()
ORDER BY embedding <=> ?
LIMIT ?
""";
// Execute query...
}
}
3. Scaling & Advanced Patterns
3.1 HNSW Indexing (Hierarchical Navigable Small World)
As the organizational memory grows (millions of records), standard ivfflat indexes may become too slow or inaccurate. We recommend using HNSW indexes for scalable vector search.
Implementation in pgvector:
-- Replace standard index with HNSW
CREATE INDEX idx_org_memory_hnsw ON organizational_memories
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
m: Max connections per layer (higher = better recall, larger index).ef_construction: Size of dynamic list during build (higher = better quality, slower build).
3.2 GraphRAG (Future Roadmap)
To support multi-hop reasoning (e.g., “How does Project X relate to Policy Y?”), standard vector search is insufficient. GraphRAG combines vector search with knowledge graph traversal.
Relational Graph Schema (PostgreSQL): Instead of a separate Graph DB, we can model entities and relationships directly in SQL.
-- Extracted Entities (Nodes)
CREATE TABLE entities (
id UUID PRIMARY KEY,
name VARCHAR(255),
type VARCHAR(50), -- 'Person', 'Project', 'Technology'
description TEXT,
embedding VECTOR(384) -- For hybrid search
);
-- Relationships (Edges)
CREATE TABLE relationships (
source_id UUID REFERENCES entities(id),
target_id UUID REFERENCES entities(id),
relation_type VARCHAR(50), -- 'CREATED_BY', 'DEPENDS_ON'
description TEXT,
weight FLOAT DEFAULT 1.0,
PRIMARY KEY (source_id, target_id, relation_type)
);
-- Linking Text Chunks to Knowledge Graph
CREATE TABLE memory_entities (
memory_id UUID REFERENCES organizational_memories(id),
entity_id UUID REFERENCES entities(id),
PRIMARY KEY (memory_id, entity_id)
);
3.3 Evaluation of Recursive Language Models (RLM)
We evaluated Recursive Language Models (RLM) as a potential solution for large-scale memory.
Conclusion: NOT ADOPTED as Storage Architecture.
RLM is an inference strategy (processing huge inputs by letting an LLM recursively chunk and summarize text via code execution), not a storage solution.
- Pros: Can “reason” over 10M+ tokens without context rot.
- Cons: Extremely slow (minutes), high cost, and requires complex code execution sandboxing.
Recommendation: Use PostgreSQL/pgvector (with HNSW) as the primary storage. Implement RLM-like logic only as a specific “Deep Research” Skill (e.g., analyze_large_document) if the agent needs to process massive files on-demand, but do not use it for general memory retrieval.
4. Internationalization: Chinese Support
3.1 The Dimension Problem
- BGE-small-en-v1.5: 384 dimensions
- BGE-small-zh-v1.5: 512 dimensions
Cannot mix in same vector column!
3.2 Solutions
Option A: Separate Tables (Recommended)
-- English memories
CREATE TABLE short_term_memories_en (
-- ... same schema ...
embedding VECTOR(384)
);
-- Chinese memories
CREATE TABLE short_term_memories_zh (
-- ... same schema ...
embedding VECTOR(512)
);
-- Query both and merge results in application layer
Option B: Unified Multilingual Model
When adding Chinese support, switch to a multilingual model:
| Model | Dimensions | Languages | Trade-off |
|---|---|---|---|
| BGE-small-en-v1.5 | 384 | En + basic multilingual | Use for both initially |
| BGE-base-en-v1.5 | 768 | Better multilingual | Higher dimensions |
| Multilingual-E5 | 384 | 100+ languages | May require re-embedding |
Option C: Re-embed Everything
When ready for Chinese:
- Choose new multilingual model
- Re-embed all existing memories
- Single table with new dimensions
3.3 Recommendation
Phase 1 (English only):
- Use
BGE-small-en-v1.5-q(384d) - Single table structure
Phase 2 (Add Chinese):
- Option: Use
BGE-small-en-v1.5for both (decent Chinese support) - Or create separate
_zhtables withBGE-small-zh-v1.5(512d) - Query service merges results from both tables
4. Implementation Guidelines
4.1 Dependencies
<!-- pom.xml -->
<dependency>
<groupId>com.networknt</groupId>
<artifactId>bge-small-en-v15-q</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.7.1</version>
</dependency>
4.2 Configuration
# application.yml
memory:
embedding:
model: bge-small-en-v1.5-q
dimensions: 384
query-prefix: "Represent this sentence for searching relevant passages:"
storage:
type: postgresql
url: ${PGVECTOR_URL}
username: ${PGVECTOR_USER}
password: ${PGVECTOR_PASSWORD}
short-term:
ttl-minutes: 60
max-memories: 100
long-term:
min-importance: 0.5
max-results: 10
knowledge:
chunk-size: 512
overlap: 50
4.3 Key Design Principles
- Separate Concerns: Vector DB for semantic search, SQL for structured data
- Query Prefixing: Always use BGE’s recommended prefix for queries
- Lifecycle Management: Short-term expires, long-term persists, both consolidate
- Dimension Consistency: Plan for Chinese support from day one