back to posts
#16 Part 4 2025-10-03 25 min

The PostgreSQL Long-Term Memory System: Giving Claude a Real Database

How we built a production-grade memory system with PostgreSQL and pgvector for semantic search

Part 4 of the Journey: Advanced Topics & Deep Dives Previous: The AI Dream Team | Next: From Keywords to Meaning

The PostgreSQL Long-Term Memory System: Giving Claude a Real Database

How we built a production-grade memory system with PostgreSQL and pgvector for semantic search

Date: October 3, 2025 Built: May-June 2025 Author: Myron Koch & Claude Code Category: Infrastructure & AI Systems

The Problem: Claude Had Amnesia

Every conversation with Claude started fresh. No memory of:

The official solution: MCP’s JSON-based memory storage.

Our solution: A full PostgreSQL database with vector embeddings.

The Evolution of Names

Stage 1: “Claude Memory” (May-July 2025)

When we started, Claude didn’t have a database-backed memory server. The official offering was:

We built our own and called it “Claude Memory” because it was memory specifically for Claude.

Stage 2: “Long-Term Memory” (July 2025-Present)

Then Claude introduced their own memory feature. Suddenly:

User: "Use Claude Memory to remember this"
Claude: "Which one?"
  - Official Claude memory (JSON blobs)
  - Our PostgreSQL system

Confusion everywhere.

We renamed ours to “Long-Term Memory” to distinguish:

The Architecture

PostgreSQL as the Foundation

# Database configuration
Database: claude_memory
Host: localhost:5432
User: m3
PostgreSQL: 15.x via Homebrew

# Current size
Entities: 216+
Observations: 2,573+

The Schema Design

-- Core tables
entities (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  entity_type TEXT,
  created_at TIMESTAMP,
  source_type TEXT,
  observation_count INTEGER
);

observations (
  id SERIAL PRIMARY KEY,
  entity_id INTEGER REFERENCES entities(id),
  observation_text TEXT NOT NULL,
  observation_index INTEGER,
  created_at TIMESTAMP,
  source_type TEXT,
  embedding vector(1536)  -- pgvector for semantic search!
);

-- Extended schema
sessions         -- Conversation sessions
projects         -- Major projects
context_threads  -- Related conversation threads
insights         -- Derived knowledge
blockchain_mcp_breakthroughs  -- Technical discoveries
mcp_factory_system           -- System patterns

The pgvector Extension

The secret sauce: semantic search with vector embeddings

-- pgvector extension enabled
CREATE EXTENSION vector;

-- Vector index for fast similarity search
CREATE INDEX observations_embedding_idx
ON observations
USING ivfflat (embedding vector_cosine_ops)
WITH (lists=100);

What this enables:

The MCP Integration

We use @henkey/postgres-mcp-server to expose PostgreSQL as MCP tools:

// .roo/mcp.json
{
  "mcpServers": {
    "claude-memory": {
      "command": "npx",
      "args": [
        "-y",
        "@henkey/postgres-mcp-server",
        "--connection-string",
        "postgresql://m3:password@localhost:5432/claude_memory"
      ],
      "env": {
        "SYSTEM_CONTEXT": "PostgreSQL memory for user-specific data. Query at conversation start for context. Use for: project continuity, personal info, technical history. NOT for: general knowledge, current events, documentation."
      }
    }
  }
}

Available MCP tools:

The Auto-Startup System

Memory system must be always available:

PostgreSQL Service

# Homebrew service management
brew services start postgresql@15

# Auto-start via macOS Login Items

MCP Server

#!/bin/bash
# start_postgres_mcp.sh - Added to Login Items

cd /Users/m3/Documents/GitHub/claude-memory-system
npx -y @henkey/postgres-mcp-server \
  --connection-string "postgresql://m3:password@localhost:5432/claude_memory"

Daily Backups

# LaunchAgent: com.claudememory.dailybackup
# Schedule: 2:00 AM daily
# Script: backup_claude_memory_automator.sh

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.claudememory.dailybackup</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/scripts/backup_claude_memory_automator.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>2</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>RunAtLoad</key>
    <false/>
</dict>
</plist>

The iCloud Sync Strategy

Privacy requirement: Memory data should NOT go to GitHub.

Solution: iCloud syncs backups, not the live PostgreSQL data directory.

# NOT synced via Git
.gitignore:
backups/
logs/
*.sql
*.dump

# iCloud syncs backup files only
# PostgreSQL data directory stays local
# Git only tracks:
# - Scripts
# - Configuration
# - Documentation

Why this works:

The Knowledge Graph Pattern

Memory is organized as entities and observations:

// Entity examples
{
  name: "Andromeda MCP Server",
  entity_type: "project",
  observation_count: 47
}

{
  name: "Myron Koch",
  entity_type: "person",
  observation_count: 156
}

{
  name: "Desktop Commander (DC)",
  entity_type: "software-tool",
  observation_count: 23
}

// Observations linked to entities
{
  entity_id: 42,  // "Andromeda MCP Server"
  observation_text: "July 8, 2025: Comprehensive validation successful after major architectural reorganization. ✅ Multi-agent tmux system operational with 6 active agents.",
  created_at: "2025-07-08T14:23:00Z",
  embedding: [0.234, -0.567, 0.123, ...]  // 1536-dim vector
}

Pattern recognition:

The Abandoned Visualizer

We built an Obsidian vault visualizer with Graph View to visualize the knowledge graph.

The Implementation

#!/usr/bin/env python3
# export_full_to_obsidian.py

def create_entity_file(entity, observations, all_entities, output_dir):
    """Create markdown file for an entity"""
    filename = sanitize_filename(entity['name']) + '.md'

    content = f"""
---
created: {format_date(entity['created_at'])}
entity_type: {entity['entity_type']}
tags: [claude-memory, {entity_type}]
---

# {entity['name']}

## Overview
**Type**: {entity['entity_type']}
**Total Observations**: {len(observations)}

## Observations
"

    # Group observations by date
    for date in sorted(obs_by_date.keys(), reverse=True):
        content += f"### {date}\n\n"
        for obs in obs_by_date[date]:
            # Add wiki links for mentioned entities
            content += f"- {obs_text}\n"

    # Add connections
    content += "## Connections\n\n"
    for related in sorted(related_entities):
        content += f"- [[{sanitize_filename(related)}|{related}]]\n"

    return filename

The Vault Structure

ObsidianVaults/ClaudeMemory/
├── Entities/
│   ├── Andromeda_MCP_Server.md
│   ├── Myron_Koch.md
│   ├── Desktop_Commander_(DC).md
│   └── ... (216 entity files)
├── Daily_Notes/
│   ├── 2025-07-08.md
│   ├── 2025-07-09.md
│   └── ... (daily observation logs)
├── Projects/
│   └── README.md (project index)
├── People/
│   └── README.md (people index)
├── Technical/
│   └── README.md (systems index)
└── README.md (vault overview)

Why We Abandoned It

The Obsidian visualizer was technically functional but had issues:

Problems:

  1. Export lag: Had to export PostgreSQL → JSON → Obsidian markdown
  2. Not real-time: Changes in database didn’t immediately reflect
  3. Graph view limitations: Obsidian graph couldn’t handle 2,500+ nodes well
  4. Manual sync: Needed to re-export after every conversation
  5. Not queryable: Couldn’t run SQL queries on the visualization

What we learned:

What We Use Instead

MCP Tools for Memory Access

-- Query at conversation start
SELECT e.name, e.entity_type, COUNT(o.id) as obs_count
FROM entities e
LEFT JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%blockchain%'
GROUP BY e.id
ORDER BY obs_count DESC;

-- Semantic search via pgvector
SELECT observation_text,
       1 - (embedding <=> query_embedding) AS similarity
FROM observations
WHERE entity_id = 42
ORDER BY embedding <=> query_embedding
LIMIT 10;

The Workflow

1. Conversation starts

2. Claude queries longterm-memory MCP

3. Gets relevant entities and observations

4. Uses context to continue previous work

5. New observations stored automatically

Example query at conversation start:

-- What projects am I working on?
SELECT name, entity_type, observation_count
FROM entities
WHERE entity_type = 'project'
  AND observation_count > 10
ORDER BY created_at DESC;

-- Recent blockchain work?
SELECT e.name, o.observation_text, o.created_at
FROM entities e
JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%MCP%'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC;

The Performance Characteristics

Query Performance

-- pgvector similarity search
EXPLAIN ANALYZE
SELECT observation_text
FROM observations
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Result: ~50ms for 2,573 observations
-- ivfflat index making this fast

Database Size

Entities: 216 rows (~50KB)
Observations: 2,573 rows (~15MB)
Embeddings: 1536 dims × 4 bytes × 2,573 = ~15MB
Total database: ~35MB

Scalability: Can handle 100,000+ observations before performance degrades.

The Backup Strategy

Daily Automated Backups

#!/bin/bash
# backup_claude_memory_automator.sh

BACKUP_DIR="/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/claude_memory_$TIMESTAMP.sql"

# PostgreSQL dump
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
  --no-owner --no-privileges \
  > "$BACKUP_FILE"

# Compress
gzip "$BACKUP_FILE"

# Keep only last 30 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +30 -delete

echo "Backup completed: $BACKUP_FILE.gz"

Backup schedule:

Manual Backup

# Quick manual backup
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
  --no-owner --no-privileges \
  > "claude_memory_manual_$(date +%Y%m%d_%H%M%S).sql"

# Or check system health (doesn't create backup)
./scripts/health_check_passwordless.sh

# Health check includes:
# - Database connection test
# - Entity count
# - Observation count
# - Last backup time

The System Prompt Integration

Memory system configured in Claude’s system context:

SYSTEM_CONTEXT: "PostgreSQL memory for user-specific data.
Query at conversation start for context.

Use for:
- Project continuity
- Personal information
- Technical history
- Previous decisions

NOT for:
- General knowledge
- Current events
- Documentation

Claude knows to:

  1. Query memory at conversation start
  2. Store important observations
  3. Link observations to entities
  4. Use semantic search when needed

The Numbers

Development timeline:

Current stats:

Usage:

Lessons Learned

1. PostgreSQL Beats JSON Every Time

JSON file storage:
- Linear search
- No relationships
- No query language
- File corruption risk

PostgreSQL:
- Indexed search
- Foreign keys
- Full SQL
- ACID guarantees

2. pgvector Is Production-Ready

Vector similarity search works reliably:

3. Visualizations Are Overrated

We spent time building Obsidian export:

4. Auto-Startup Is Essential

Memory system must be always available:

5. iCloud Sync for Privacy

Git is wrong place for personal memory:

The Alternatives We Rejected

Option 1: Official MCP Memory Server

# @modelcontextprotocol/server-memory
npm install -g @modelcontextprotocol/server-memory

Why we rejected:

Option 2: Vector Database (Pinecone, Weaviate)

Why we rejected:

Option 3: Graph Database (Neo4j)

Why we rejected:

The Future Enhancements

Potential Improvements

  1. Automatic entity extraction - NER from conversations
  2. Relationship inference - Detect entity connections
  3. Observation clustering - Group related memories
  4. Time-aware queries - “What was I working on last week?”
  5. Cross-conversation threading - Link related discussions

What We Won’t Build

  1. Obsidian visualizer - Tried it, abandoned it
  2. Real-time sync across devices - iCloud is enough
  3. Web interface - MCP tools are better
  4. Mobile app - Not needed

The Checklist for Building Your Own

Want to build a similar system?

References


This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes the simple solution (PostgreSQL) is better than the fancy one (specialized vector DBs).


Prerequisites

Next Steps

Deep Dives