Part 4 of the Journey: Advanced Topics & Deep Dives Previous: The AI Dream Team | Next: From Keywords to Meaning

The PostgreSQL Long-Term Memory System: Giving Claude a Real Database

How we built a production-grade memory system with PostgreSQL and pgvector for semantic search

Date: October 3, 2025 Built: May-June 2025 Author: Myron Koch & Claude Code Category: Infrastructure & AI Systems

The Problem: Claude Had Amnesia

Every conversation with Claude started fresh. No memory of:

Previous projects we discussed
Technical decisions we made
Patterns we discovered
Problems we already solved

The official solution: MCP’s JSON-based memory storage.

Our solution: A full PostgreSQL database with vector embeddings.

The Evolution of Names

Stage 1: “Claude Memory” (May-July 2025)

When we started, Claude didn’t have a database-backed memory server. The official offering was:

JSON file storage
No semantic search
No relational queries
No real database capabilities

We built our own and called it “Claude Memory” because it was memory specifically for Claude.

Stage 2: “Long-Term Memory” (July 2025-Present)

Then Claude introduced their own memory feature. Suddenly:

User: "Use Claude Memory to remember this"
Claude: "Which one?"
  - Official Claude memory (JSON blobs)
  - Our PostgreSQL system

Confusion everywhere.

We renamed ours to “Long-Term Memory” to distinguish:

Claude’s Memory: Built-in JSON storage
Long-Term Memory: Our PostgreSQL + pgvector system

The Architecture

PostgreSQL as the Foundation

# Database configuration
Database: claude_memory
Host: localhost:5432
User: m3
PostgreSQL: 15.x via Homebrew

# Current size
Entities: 216+
Observations: 2,573+

The Schema Design

-- Core tables
entities (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  entity_type TEXT,
  created_at TIMESTAMP,
  source_type TEXT,
  observation_count INTEGER
);

observations (
  id SERIAL PRIMARY KEY,
  entity_id INTEGER REFERENCES entities(id),
  observation_text TEXT NOT NULL,
  observation_index INTEGER,
  created_at TIMESTAMP,
  source_type TEXT,
  embedding vector(1536)  -- pgvector for semantic search!
);

-- Extended schema
sessions         -- Conversation sessions
projects         -- Major projects
context_threads  -- Related conversation threads
insights         -- Derived knowledge
blockchain_mcp_breakthroughs  -- Technical discoveries
mcp_factory_system           -- System patterns

The pgvector Extension

The secret sauce: semantic search with vector embeddings

-- pgvector extension enabled
CREATE EXTENSION vector;

-- Vector index for fast similarity search
CREATE INDEX observations_embedding_idx
ON observations
USING ivfflat (embedding vector_cosine_ops)
WITH (lists=100);

What this enables:

Find similar memories by semantic meaning, not keywords
“Show me everything related to multi-agent systems” finds relevant observations even if they don’t contain those exact words
Cosine similarity search across 2,573+ observations

The MCP Integration

We use @henkey/postgres-mcp-server to expose PostgreSQL as MCP tools:

// .roo/mcp.json
{
  "mcpServers": {
    "claude-memory": {
      "command": "npx",
      "args": [
        "-y",
        "@henkey/postgres-mcp-server",
        "--connection-string",
        "postgresql://m3:password@localhost:5432/claude_memory"
      ],
      "env": {
        "SYSTEM_CONTEXT": "PostgreSQL memory for user-specific data. Query at conversation start for context. Use for: project continuity, personal info, technical history. NOT for: general knowledge, current events, documentation."
      }
    }
  }
}

Available MCP tools:

list_schemas - Show database schemas
list_objects - List tables, views, sequences
get_object_details - Inspect table structure
execute_sql - Run queries
explain_query - Analyze query plans
analyze_db_health - Database health checks
get_top_queries - Performance monitoring

The Auto-Startup System

Memory system must be always available:

PostgreSQL Service

# Homebrew service management
brew services start postgresql@15

# Auto-start via macOS Login Items

MCP Server

#!/bin/bash
# start_postgres_mcp.sh - Added to Login Items

cd /Users/m3/Documents/GitHub/claude-memory-system
npx -y @henkey/postgres-mcp-server \
  --connection-string "postgresql://m3:password@localhost:5432/claude_memory"

Daily Backups

# LaunchAgent: com.claudememory.dailybackup
# Schedule: 2:00 AM daily
# Script: backup_claude_memory_automator.sh

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.claudememory.dailybackup</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/scripts/backup_claude_memory_automator.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>2</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>RunAtLoad</key>
    <false/>
</dict>
</plist>

The iCloud Sync Strategy

Privacy requirement: Memory data should NOT go to GitHub.

Solution: iCloud syncs backups, not the live PostgreSQL data directory.

# NOT synced via Git
.gitignore:
backups/
logs/
*.sql
*.dump

# iCloud syncs backup files only
# PostgreSQL data directory stays local
# Git only tracks:
# - Scripts
# - Configuration
# - Documentation

Why this works:

Each Mac runs its own local PostgreSQL instance
Daily backups sync via iCloud for redundancy
No sensitive memory data in Git repos
Backups can restore on any Mac
No risk of database corruption from concurrent access

The Knowledge Graph Pattern

Memory is organized as entities and observations:

// Entity examples
{
  name: "Andromeda MCP Server",
  entity_type: "project",
  observation_count: 47
}

{
  name: "Myron Koch",
  entity_type: "person",
  observation_count: 156
}

{
  name: "Desktop Commander (DC)",
  entity_type: "software-tool",
  observation_count: 23
}

// Observations linked to entities
{
  entity_id: 42,  // "Andromeda MCP Server"
  observation_text: "July 8, 2025: Comprehensive validation successful after major architectural reorganization. ✅ Multi-agent tmux system operational with 6 active agents.",
  created_at: "2025-07-08T14:23:00Z",
  embedding: [0.234, -0.567, 0.123, ...]  // 1536-dim vector
}

Pattern recognition:

Entities are nouns (people, projects, systems)
Observations are facts about entities
Embeddings enable semantic connections
Time series shows evolution

The Abandoned Visualizer

We built an Obsidian vault visualizer with Graph View to visualize the knowledge graph.

The Implementation

#!/usr/bin/env python3
# export_full_to_obsidian.py

def create_entity_file(entity, observations, all_entities, output_dir):
    """Create markdown file for an entity"""
    filename = sanitize_filename(entity['name']) + '.md'

    content = f"""
---
created: {format_date(entity['created_at'])}
entity_type: {entity['entity_type']}
tags: [claude-memory, {entity_type}]
---

# {entity['name']}

## Overview
**Type**: {entity['entity_type']}
**Total Observations**: {len(observations)}

## Observations
"

    # Group observations by date
    for date in sorted(obs_by_date.keys(), reverse=True):
        content += f"### {date}\n\n"
        for obs in obs_by_date[date]:
            # Add wiki links for mentioned entities
            content += f"- {obs_text}\n"

    # Add connections
    content += "## Connections\n\n"
    for related in sorted(related_entities):
        content += f"- [[{sanitize_filename(related)}|{related}]]\n"

    return filename

The Vault Structure

ObsidianVaults/ClaudeMemory/
├── Entities/
│   ├── Andromeda_MCP_Server.md
│   ├── Myron_Koch.md
│   ├── Desktop_Commander_(DC).md
│   └── ... (216 entity files)
├── Daily_Notes/
│   ├── 2025-07-08.md
│   ├── 2025-07-09.md
│   └── ... (daily observation logs)
├── Projects/
│   └── README.md (project index)
├── People/
│   └── README.md (people index)
├── Technical/
│   └── README.md (systems index)
└── README.md (vault overview)

Why We Abandoned It

The Obsidian visualizer was technically functional but had issues:

Problems:

Export lag: Had to export PostgreSQL → JSON → Obsidian markdown
Not real-time: Changes in database didn’t immediately reflect
Graph view limitations: Obsidian graph couldn’t handle 2,500+ nodes well
Manual sync: Needed to re-export after every conversation
Not queryable: Couldn’t run SQL queries on the visualization

What we learned:

Visualization is nice but not essential
Direct PostgreSQL queries are more powerful
MCP tools provide better interface than markdown files
Real-time access > pretty graphs

What We Use Instead

MCP Tools for Memory Access

-- Query at conversation start
SELECT e.name, e.entity_type, COUNT(o.id) as obs_count
FROM entities e
LEFT JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%blockchain%'
GROUP BY e.id
ORDER BY obs_count DESC;

-- Semantic search via pgvector
SELECT observation_text,
       1 - (embedding <=> query_embedding) AS similarity
FROM observations
WHERE entity_id = 42
ORDER BY embedding <=> query_embedding
LIMIT 10;

The Workflow

1. Conversation starts
   ↓
2. Claude queries longterm-memory MCP
   ↓
3. Gets relevant entities and observations
   ↓
4. Uses context to continue previous work
   ↓
5. New observations stored automatically

Example query at conversation start:

-- What projects am I working on?
SELECT name, entity_type, observation_count
FROM entities
WHERE entity_type = 'project'
  AND observation_count > 10
ORDER BY created_at DESC;

-- Recent blockchain work?
SELECT e.name, o.observation_text, o.created_at
FROM entities e
JOIN observations o ON o.entity_id = e.id
WHERE e.name LIKE '%MCP%'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC;

The Performance Characteristics

Query Performance

-- pgvector similarity search
EXPLAIN ANALYZE
SELECT observation_text
FROM observations
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Result: ~50ms for 2,573 observations
-- ivfflat index making this fast

Database Size

Entities: 216 rows (~50KB)
Observations: 2,573 rows (~15MB)
Embeddings: 1536 dims × 4 bytes × 2,573 = ~15MB
Total database: ~35MB

Scalability: Can handle 100,000+ observations before performance degrades.

The Backup Strategy

Daily Automated Backups

#!/bin/bash
# backup_claude_memory_automator.sh

BACKUP_DIR="/Users/m3/Documents/GitHub/claude-memory-system/legacy-backup-system/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/claude_memory_$TIMESTAMP.sql"

# PostgreSQL dump
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
  --no-owner --no-privileges \
  > "$BACKUP_FILE"

# Compress
gzip "$BACKUP_FILE"

# Keep only last 30 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +30 -delete

echo "Backup completed: $BACKUP_FILE.gz"

Backup schedule:

Daily: 2:00 AM via LaunchAgent
Retention: 30 days
Location: legacy-backup-system/backups/
iCloud synced for off-machine backup

Manual Backup

# Quick manual backup
pg_dump -h localhost -p 5432 -U m3 -d claude_memory \
  --no-owner --no-privileges \
  > "claude_memory_manual_$(date +%Y%m%d_%H%M%S).sql"

# Or check system health (doesn't create backup)
./scripts/health_check_passwordless.sh

# Health check includes:
# - Database connection test
# - Entity count
# - Observation count
# - Last backup time

The System Prompt Integration

Memory system configured in Claude’s system context:

SYSTEM_CONTEXT: "PostgreSQL memory for user-specific data.
Query at conversation start for context.

Use for:
- Project continuity
- Personal information
- Technical history
- Previous decisions

NOT for:
- General knowledge
- Current events
- Documentation

Claude knows to:

Query memory at conversation start
Store important observations
Link observations to entities
Use semantic search when needed

The Numbers

Development timeline:

Initial build: May 2025
Production deployment: June 2025
Obsidian visualizer: June 2025 (abandoned July 2025)
Rename to Long-Term Memory: July 2025
iCloud sync: August 2025

Current stats:

216 entities tracked
2,573+ observations
~35MB database size
<100ms query latency
100% uptime (auto-restart)
30-day rolling backups

Usage:

Queried at start of every conversation
~50 new observations per week
Semantic search: ~10 queries per day
Database health checks: Daily at 2AM

Lessons Learned

1. PostgreSQL Beats JSON Every Time

JSON file storage:
- Linear search
- No relationships
- No query language
- File corruption risk

PostgreSQL:
- Indexed search
- Foreign keys
- Full SQL
- ACID guarantees

2. pgvector Is Production-Ready

Vector similarity search works reliably:

Handles 1536-dimension embeddings
ivfflat index is fast enough
Cosine similarity is the right metric
No issues with 2,500+ vectors

3. Visualizations Are Overrated

We spent time building Obsidian export:

Looked impressive
Didn’t add value
SQL queries more useful
Real-time access > pretty graphs

4. Auto-Startup Is Essential

Memory system must be always available:

macOS Login Items work perfectly
LaunchAgents for scheduled tasks
Homebrew services for PostgreSQL
Zero manual intervention

5. iCloud Sync for Privacy

Git is wrong place for personal memory:

iCloud handles database files
No sensitive data in repos
Works across Macs
Private by default

The Alternatives We Rejected

Option 1: Official MCP Memory Server

# @modelcontextprotocol/server-memory
npm install -g @modelcontextprotocol/server-memory

Why we rejected:

JSON file storage (not a database)
No semantic search
No query language
Limited to 10,000 observations

Option 2: Vector Database (Pinecone, Weaviate)

Why we rejected:

External service (network dependency)
Costs money at scale
Harder to backup
PostgreSQL + pgvector is simpler

Option 3: Graph Database (Neo4j)

Why we rejected:

Overkill for our use case
Harder to query
More complex backup
PostgreSQL can do graphs too

The Future Enhancements

Potential Improvements

Automatic entity extraction - NER from conversations
Relationship inference - Detect entity connections
Observation clustering - Group related memories
Time-aware queries - “What was I working on last week?”
Cross-conversation threading - Link related discussions

What We Won’t Build

~~Obsidian visualizer~~ - Tried it, abandoned it
~~Real-time sync across devices~~ - iCloud is enough
~~Web interface~~ - MCP tools are better
~~Mobile app~~ - Not needed

The Checklist for Building Your Own

Want to build a similar system?

References

PostgreSQL memory system: /Users/m3/Documents/GitHub/claude-memory-system/
pgvector documentation: https://github.com/pgvector/pgvector
MCP PostgreSQL server: https://github.com/henkey/postgres-mcp-server
Obsidian export scripts: /python/obsidian/
Backup automation: /legacy-backup-system/
System documentation: /docs/SYSTEM_DOCUMENTATION.md

This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes the simple solution (PostgreSQL) is better than the fancy one (specialized vector DBs).

Prerequisites

The AI Dream Team - See how a robust memory system provides critical context for multi-AI collaboration.

Next Steps

From Keywords to Meaning: Building a Semantic Memory System with Ollama - Dive deeper into the technology that makes this memory system truly intelligent.

Deep Dives

From Keywords to Meaning: Building a Semantic Memory System with Ollama - This post is a direct continuation, explaining the semantic search implementation in detail.

Chat with My Blog