Part 4 of the Journey: Advanced Topics & Deep Dives Previous: The AI Dream Team | Next: From Keywords to Meaning
The PostgreSQL Long-Term Memory System: Giving Claude a Real Database
How we built a production-grade memory system with PostgreSQL and pgvector for semantic search
Date: October 3, 2025 Built: May-June 2025 Author: Myron Koch & Claude Code Category: Infrastructure & AI Systems
The Problem: Claude Had Amnesia
Every conversation with Claude started fresh. No memory of:
- Previous projects we discussed
- Technical decisions we made
- Patterns we discovered
- Problems we already solved
The official solution: MCP’s JSON-based memory storage.
Our solution: A full PostgreSQL database with vector embeddings.
The Evolution of Names
Stage 1: “Claude Memory” (May-July 2025)
When we started, Claude didn’t have a database-backed memory server. The official offering was:
- JSON file storage
- No semantic search
- No relational queries
- No real database capabilities
We built our own and called it “Claude Memory” because it was memory specifically for Claude.
Stage 2: “Long-Term Memory” (July 2025-Present)
Then Claude introduced their own memory feature. Suddenly:
User: "Use Claude Memory to remember this"
Claude: "Which one?"
- Official Claude memory (JSON blobs)
- Our PostgreSQL system
Confusion everywhere.
We renamed ours to “Long-Term Memory” to distinguish:
- Claude’s Memory: Built-in JSON storage
- Long-Term Memory: Our PostgreSQL + pgvector system
The Architecture
PostgreSQL as the Foundation
# Database configuration
Database: claude_memory
Host: localhost:5432
User: m3
PostgreSQL: 15.x via Homebrew
# Current size
Entities: 216+
Observations: 2,573+
The Schema Design
-- Core tables
entities (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
entity_type TEXT,
created_at TIMESTAMP,
source_type TEXT,
observation_count INTEGER
);
observations (
id SERIAL PRIMARY KEY,
entity_id INTEGER REFERENCES entities(id),
observation_text TEXT NOT NULL,
observation_index INTEGER,
created_at TIMESTAMP,
source_type TEXT,
embedding vector(1536) -- pgvector for semantic search!
);
-- Extended schema
sessions -- Conversation sessions
projects -- Major projects
context_threads -- Related conversation threads
insights -- Derived knowledge
blockchain_mcp_breakthroughs -- Technical discoveries
mcp_factory_system -- System patterns
The pgvector Extension
The secret sauce: semantic search with vector embeddings
-- pgvector extension enabled
CREATE EXTENSION vector;
-- Vector index for fast similarity search
CREATE INDEX observations_embedding_idx
ON observations
USING ivfflat (embedding vector_cosine_ops)
WITH (lists=100);
What this enables:
- Find similar memories by semantic meaning, not keywords
- “Show me everything related to multi-agent systems” finds relevant observations even if they don’t contain those exact words
- Cosine similarity search across 2,573+ observations
The MCP Integration
We use @henkey/postgres-mcp-server to expose PostgreSQL as MCP tools:
// .roo/mcp.json
{
"mcpServers": {
"claude-memory": {
"command": "npx",
"args": [
"-y",
"@henkey/postgres-mcp-server",
"--connection-string",
"postgresql://m3:password@localhost:5432/claude_memory"
],
"env": {
"SYSTEM_CONTEXT": "PostgreSQL memory for user-specific data. Query at conversation start for context. Use for: project continuity, personal info, technical history. NOT for: general knowledge, current events, documentation."
}
}
}
}
Available MCP tools:
list_schemas- Show database schemaslist_objects- List tables, views, sequencesget_object_details- Inspect table structureexecute_sql- Run queriesexplain_query- Analyze query plansanalyze_db_health- Database health checksget_top_queries- Performance monitoring
The Auto-Startup System
Memory system must be always available:
PostgreSQL Service
# Homebrew service management
brew services start postgresql@15
# Auto-start via macOS Login Items
MCP Server
#!/bin/bash
# start_postgres_mcp.sh - Added to Login Items
cd /Users/m3/Documents/GitHub/claude-memory-system
npx -y @henkey/postgres-mcp-server \
--connection-string "postgresql://m3:password@localhost:5432/claude_memory"
Daily Backups
# LaunchAgent: com.claudememory.dailybackup
# Schedule: 2:00 AM daily
# Script: backup_claude_memory_automator.sh
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.claudememory.dailybackup</string>
<key>ProgramArguments</key>
<array>
<string>/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/scripts/backup_claude_memory_automator.sh</string>
</array>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>2</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<key>RunAtLoad</key>
<false/>
</dict>
</plist>
The iCloud Sync Strategy
Privacy requirement: Memory data should NOT go to GitHub.
Solution: iCloud syncs backups, not the live PostgreSQL data directory.
# NOT synced via Git
.gitignore:
backups/
logs/
*.sql
*.dump
# iCloud syncs backup files only
# PostgreSQL data directory stays local
# Git only tracks:
# - Scripts
# - Configuration
# - Documentation
Why this works:
- Each Mac runs its own local PostgreSQL instance
- Daily backups sync via iCloud for redundancy
- No sensitive memory data in Git repos
- Backups can restore on any Mac
- No risk of database corruption from concurrent access
The Knowledge Graph Pattern
Memory is organized as entities and observations:
// Entity examples
{
name: "Andromeda MCP Server",
entity_type: "project",
observation_count: 47
}
{
name: "Myron Koch",
entity_type: "person",
observation_count: 156
}
{
name: "Desktop Commander (DC)",
entity_type: "software-tool",
observation_count: 23
}
// Observations linked to entities
{
entity_id: 42, // "Andromeda MCP Server"
observation_text: "July 8, 2025: Comprehensive validation successful after major architectural reorganization. ✅ Multi-agent tmux system operational with 6 active agents.",
created_at: "2025-07-08T14:23:00Z",
embedding: [0.234, -0.567, 0.123, ...] // 1536-dim vector
}
Pattern recognition:
- Entities are nouns (people, projects, systems)
- Observations are facts about entities
- Embeddings enable semantic connections
- Time series shows evolution
The Abandoned Visualizer
We built an Obsidian vault visualizer with Graph View to visualize the knowledge graph.
The Implementation
#!/usr/bin/env python3
# export_full_to_obsidian.py
def create_entity_file(entity, observations, all_entities, output_dir):
"""Create markdown file for an entity"""
filename = sanitize_filename(entity['name']) + '.md'
content = f"""
---
created: {format_date(entity['created_at'])}
entity_type: {entity['entity_type']}
tags: [claude-memory, {entity_type}]
---
# {entity['name']}
## Overview
**Type**: {entity['entity_type']}
**Total Observations**: {len(observations)}
## Observations
"
# Group observations by date
for date in sorted(obs_by_date.keys(), reverse=True):
content += f"### {date}\n\n"
for obs in obs_by_date[date]:
# Add wiki links for mentioned entities
content += f"- {obs_text}\n"
# Add connections
content += "## Connections\n\n"
for related in sorted(related_entities):
content += f"- [[{sanitize_filename(related)}|{related}]]\n"
return filename
The Vault Structure
ObsidianVaults/ClaudeMemory/
├── Entities/
│ ├── Andromeda_MCP_Server.md
│ ├── Myron_Koch.md
│ ├── Desktop_Commander_(DC).md
│ └── ... (216 entity files)
├── Daily_Notes/
│ ├── 2025-07-08.md
│ ├── 2025-07-09.md
│ └── ... (daily observation logs)
├── Projects/
│ └── README.md (project index)
├── People/
│ └── README.md (people index)
├── Technical/
│ └── README.md (systems index)
└── README.md (vault overview)
Why We Abandoned It
The Obsidian visualizer was technically functional but had issues:
Problems:
- Export lag: Had to export PostgreSQL → JSON → Obsidian markdown
- Not real-time: Changes in database didn’t immediately reflect
- Graph view limitations: Obsidian graph couldn’t handle 2,500+ nodes well
- Manual sync: Needed to re-export after every conversation
- Not queryable: Couldn’t run SQL queries on the visualization
What we learned:
- Visualization is nice but not essential
- Direct PostgreSQL queries are more powerful
- MCP tools provide better interface than markdown files
- Real-time access > pretty graphs
What We Use Instead
MCP Tools for Memory Access
-- Query at conversation start
SELECT e.name, e.entity_type, COUNT(o.id) as obs_count
FROM entities e
LEFT JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%blockchain%'
GROUP BY e.id
ORDER BY obs_count DESC;
-- Semantic search via pgvector
SELECT observation_text,
1 - (embedding <=> query_embedding) AS similarity
FROM observations
WHERE entity_id = 42
ORDER BY embedding <=> query_embedding
LIMIT 10;
The Workflow
1. Conversation starts
↓
2. Claude queries longterm-memory MCP
↓
3. Gets relevant entities and observations
↓
4. Uses context to continue previous work
↓
5. New observations stored automatically
Example query at conversation start:
-- What projects am I working on?
SELECT name, entity_type, observation_count
FROM entities
WHERE entity_type = 'project'
AND observation_count > 10
ORDER BY created_at DESC;
-- Recent blockchain work?
SELECT e.name, o.observation_text, o.created_at
FROM entities e
JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%MCP%'
AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC;
The Performance Characteristics
Query Performance
-- pgvector similarity search
EXPLAIN ANALYZE
SELECT observation_text
FROM observations
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- Result: ~50ms for 2,573 observations
-- ivfflat index making this fast
Database Size
Entities: 216 rows (~50KB)
Observations: 2,573 rows (~15MB)
Embeddings: 1536 dims × 4 bytes × 2,573 = ~15MB
Total database: ~35MB
Scalability: Can handle 100,000+ observations before performance degrades.
The Backup Strategy
Daily Automated Backups
#!/bin/bash
# backup_claude_memory_automator.sh
BACKUP_DIR="/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/claude_memory_$TIMESTAMP.sql"
# PostgreSQL dump
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
--no-owner --no-privileges \
> "$BACKUP_FILE"
# Compress
gzip "$BACKUP_FILE"
# Keep only last 30 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +30 -delete
echo "Backup completed: $BACKUP_FILE.gz"
Backup schedule:
- Daily: 2:00 AM via LaunchAgent
- Retention: 30 days
- Location:
legacy-backup-system/backups/ - iCloud synced for off-machine backup
Manual Backup
# Quick manual backup
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
--no-owner --no-privileges \
> "claude_memory_manual_$(date +%Y%m%d_%H%M%S).sql"
# Or check system health (doesn't create backup)
./scripts/health_check_passwordless.sh
# Health check includes:
# - Database connection test
# - Entity count
# - Observation count
# - Last backup time
The System Prompt Integration
Memory system configured in Claude’s system context:
SYSTEM_CONTEXT: "PostgreSQL memory for user-specific data.
Query at conversation start for context.
Use for:
- Project continuity
- Personal information
- Technical history
- Previous decisions
NOT for:
- General knowledge
- Current events
- Documentation
Claude knows to:
- Query memory at conversation start
- Store important observations
- Link observations to entities
- Use semantic search when needed
The Numbers
Development timeline:
- Initial build: May 2025
- Production deployment: June 2025
- Obsidian visualizer: June 2025 (abandoned July 2025)
- Rename to Long-Term Memory: July 2025
- iCloud sync: August 2025
Current stats:
- 216 entities tracked
- 2,573+ observations
- ~35MB database size
- <100ms query latency
- 100% uptime (auto-restart)
- 30-day rolling backups
Usage:
- Queried at start of every conversation
- ~50 new observations per week
- Semantic search: ~10 queries per day
- Database health checks: Daily at 2AM
Lessons Learned
1. PostgreSQL Beats JSON Every Time
JSON file storage:
- Linear search
- No relationships
- No query language
- File corruption risk
PostgreSQL:
- Indexed search
- Foreign keys
- Full SQL
- ACID guarantees
2. pgvector Is Production-Ready
Vector similarity search works reliably:
- Handles 1536-dimension embeddings
- ivfflat index is fast enough
- Cosine similarity is the right metric
- No issues with 2,500+ vectors
3. Visualizations Are Overrated
We spent time building Obsidian export:
- Looked impressive
- Didn’t add value
- SQL queries more useful
- Real-time access > pretty graphs
4. Auto-Startup Is Essential
Memory system must be always available:
- macOS Login Items work perfectly
- LaunchAgents for scheduled tasks
- Homebrew services for PostgreSQL
- Zero manual intervention
5. iCloud Sync for Privacy
Git is wrong place for personal memory:
- iCloud handles database files
- No sensitive data in repos
- Works across Macs
- Private by default
The Alternatives We Rejected
Option 1: Official MCP Memory Server
# @modelcontextprotocol/server-memory
npm install -g @modelcontextprotocol/server-memory
Why we rejected:
- JSON file storage (not a database)
- No semantic search
- No query language
- Limited to 10,000 observations
Option 2: Vector Database (Pinecone, Weaviate)
Why we rejected:
- External service (network dependency)
- Costs money at scale
- Harder to backup
- PostgreSQL + pgvector is simpler
Option 3: Graph Database (Neo4j)
Why we rejected:
- Overkill for our use case
- Harder to query
- More complex backup
- PostgreSQL can do graphs too
The Future Enhancements
Potential Improvements
- Automatic entity extraction - NER from conversations
- Relationship inference - Detect entity connections
- Observation clustering - Group related memories
- Time-aware queries - “What was I working on last week?”
- Cross-conversation threading - Link related discussions
What We Won’t Build
Obsidian visualizer- Tried it, abandoned itReal-time sync across devices- iCloud is enoughWeb interface- MCP tools are betterMobile app- Not needed
The Checklist for Building Your Own
Want to build a similar system?
- Install PostgreSQL 15+ with pgvector extension
- Design entity-observation schema
- Create vector embeddings column (1536 dims)
- Add ivfflat index for similarity search
- Configure
@henkey/postgres-mcp-server - Set up auto-startup (Login Items)
- Configure daily backups (LaunchAgent)
- Test semantic search queries
- Integrate with Claude’s system prompt
- Set up iCloud sync for privacy
- Skip the visualizer (trust us)
References
- PostgreSQL memory system:
/Users/m3/Documents/GitHub/claude-memory-system/ - pgvector documentation: https://github.com/pgvector/pgvector
- MCP PostgreSQL server: https://github.com/henkey/postgres-mcp-server
- Obsidian export scripts:
/python/obsidian/ - Backup automation:
/legacy-backup-system/ - System documentation:
/docs/SYSTEM_DOCUMENTATION.md
This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes the simple solution (PostgreSQL) is better than the fancy one (specialized vector DBs).
Related Reading
Prerequisites
- The AI Dream Team - See how a robust memory system provides critical context for multi-AI collaboration.
Next Steps
- From Keywords to Meaning: Building a Semantic Memory System with Ollama - Dive deeper into the technology that makes this memory system truly intelligent.
Deep Dives
- From Keywords to Meaning: Building a Semantic Memory System with Ollama - This post is a direct continuation, explaining the semantic search implementation in detail.