Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Context Window Management | Next: The MCP Inspector Deep Dive
Rate Limiting Without State: The MCP Paradox
How we implemented API rate limiting in a completely stateless protocol
Date: August 18, 2025 Author: Myron Koch & Claude Code Category: Architecture Challenges
The Impossible Problem
MCP servers are stateless. Every request is independent. No memory between calls.
But blockchain APIs have rate limits:
- Infura: 100,000 requests/day
- Alchemy: 300 requests/second
- Binance: 1200 requests/minute
- CoinGecko: 10 requests/minute (free tier)
How do you track request counts when you can’t remember anything?
Why Traditional Solutions Don’t Work
Can’t Use In-Memory Counters
// This doesn't work in MCP
let requestCount = 0; // Resets every request!
async function handleRequest() {
requestCount++; // Always 1
if (requestCount > 100) { // Never triggers
throw new Error('Rate limited');
}
}
Can’t Use Redis/Database
// MCP servers should be zero-dependency
const redis = require('redis'); // ❌ External dependency
await redis.incr('api:requests'); // ❌ Stateful storage
Can’t Use Global State
// Each tool call is isolated
global.requestTimestamps = []; // ❌ Doesn't persist
process.env.REQUEST_COUNT++; // ❌ Not writable
The Breakthrough: Time-Based Bucketing
We can’t count requests, but we CAN use time:
export class StatelessRateLimiter {
private readonly requestsPerSecond: number;
private readonly minInterval: number;
private lastCallTime = 0;
private readonly mutex = new Promise<void>(resolve => resolve());
constructor(requestsPerSecond: number) {
this.requestsPerSecond = requestsPerSecond;
this.minInterval = 1000 / requestsPerSecond;
}
async throttle<T>(fn: () => Promise<T>): Promise<T> {
// Wait for previous call to release the lock
await this.mutex;
// Create new mutex for next caller
let releaseLock: () => void;
this.mutex = new Promise(resolve => { releaseLock = resolve; });
try {
const now = Date.now();
const timeSinceLastCall = now - this.lastCallTime;
// Enforce minimum interval between calls
if (timeSinceLastCall < this.minInterval) {
await this.sleep(this.minInterval - timeSinceLastCall);
}
this.lastCallTime = Date.now();
return await fn();
} finally {
releaseLock!();
}
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
The File-Based Counter Pattern
For APIs with daily limits, we use the filesystem:
// The filesystem IS our state!
import fs from 'fs';
import path from 'path';
class FileBasedRateLimiter {
private readonly limitDir = '/tmp/mcp-rate-limits';
constructor() {
// Ensure directory exists on initialization
if (!fs.existsSync(this.limitDir)) {
fs.mkdirSync(this.limitDir, { recursive: true });
}
}
async checkLimit(api: string, limit: number): Promise<boolean> {
const today = new Date().toISOString().split('T')[0];
const countFile = path.join(this.limitDir, `${api}-${today}.count`);
const lockFile = `${countFile}.lock`;
// Acquire file lock to prevent race conditions
while (fs.existsSync(lockFile)) {
await new Promise(resolve => setTimeout(resolve, 10));
}
try {
// Create lock file
fs.writeFileSync(lockFile, Date.now().toString());
// Read current count
let count = 0;
try {
count = parseInt(fs.readFileSync(countFile, 'utf8'), 10);
} catch {
// File doesn't exist, first request today
}
if (count >= limit) {
throw new Error(`Daily limit reached for ${api}: ${count}/${limit}`);
}
// Increment count atomically
fs.writeFileSync(countFile, String(count + 1));
return true;
} finally {
// Release lock
fs.unlinkSync(lockFile);
}
}
async cleanup(): Promise<void> {
// Remove old count files
const files = fs.readdirSync(this.limitDir);
const today = new Date().toISOString().split('T')[0];
files.forEach(file => {
if (!file.includes(today)) {
fs.unlinkSync(path.join(this.limitDir, file));
}
});
}
}
The Circuit Breaker Pattern
When APIs fail, we back off WITHOUT remembering failures:
export class StatelessCircuitBreaker {
async execute<T>(
fn: () => Promise<T>,
options: { maxRetries: number } = { maxRetries: 3 }
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt < options.maxRetries; attempt++) {
try {
// Exponential backoff based on attempt number
if (attempt > 0) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await this.sleep(delay);
}
return await fn();
} catch (error: any) {
lastError = error;
// Check if error is rate limit
if (this.isRateLimitError(error)) {
// Extract retry-after header if available
const retryAfter = this.getRetryAfter(error);
if (retryAfter) {
await this.sleep(retryAfter * 1000);
continue;
}
}
// If not rate limit or last attempt, throw
if (attempt === options.maxRetries - 1) {
throw error;
}
}
}
throw lastError || new Error('All attempts failed');
}
private isRateLimitError(error: any): boolean {
return error.code === 429 ||
error.message?.includes('rate limit') ||
error.message?.includes('too many requests');
}
private getRetryAfter(error: any): number | null {
// Check various places APIs put retry-after
return error.retryAfter ||
error.headers?.['retry-after'] ||
error.response?.headers?.['retry-after'] ||
null;
}
}
The Time-Window Queue
For complex APIs with “X requests per Y minutes”:
class TimeWindowQueue {
// Use timestamp encoding to track request windows
private encodeWindow(timestamp: number, windowSize: number): string {
const window = Math.floor(timestamp / windowSize);
return `window:${window}`;
}
async canMakeRequest(
api: string,
limit: number,
windowMs: number
): Promise<boolean> {
const now = Date.now();
const currentWindow = this.encodeWindow(now, windowMs);
// Use filesystem to track requests in current window
const windowFile = `/tmp/mcp-rl/${api}-${currentWindow}.json`;
let timestamps: number[] = [];
try {
timestamps = JSON.parse(fs.readFileSync(windowFile, 'utf8'));
} catch {
// No file means new window
}
// Remove timestamps outside current window
const windowStart = now - windowMs;
timestamps = timestamps.filter(ts => ts > windowStart);
if (timestamps.length >= limit) {
// Calculate when next request can be made
const oldestInWindow = Math.min(...timestamps);
const nextAvailable = oldestInWindow + windowMs;
const waitTime = nextAvailable - now;
throw new Error(
`Rate limit exceeded. Retry in ${Math.ceil(waitTime / 1000)}s`
);
}
// Add current request
timestamps.push(now);
fs.writeFileSync(windowFile, JSON.stringify(timestamps));
return true;
}
}
The Distributed Rate Limiting Pattern
Multiple MCP server instances? Use lock files:
class DistributedRateLimiter {
private async acquireLock(resource: string): Promise<() => void> {
const lockFile = `/tmp/mcp-locks/${resource}.lock`;
const lockId = Math.random().toString(36);
// Spin until we get the lock
while (true) {
try {
// Atomic file creation
fs.writeFileSync(lockFile, lockId, { flag: 'wx' });
// Return unlock function
return () => {
try {
const current = fs.readFileSync(lockFile, 'utf8');
if (current === lockId) {
fs.unlinkSync(lockFile);
}
} catch {
// Lock already released
}
};
} catch {
// Lock exists, wait and retry
await this.sleep(10 + Math.random() * 40);
}
}
}
async executeWithLock<T>(
resource: string,
fn: () => Promise<T>
): Promise<T> {
const unlock = await this.acquireLock(resource);
try {
return await fn();
} finally {
unlock();
}
}
}
Real-World Implementation
Here’s how we use it in our servers:
// src/utils/rateLimiter.ts
export class APIRateLimiter {
private limiters = new Map<string, any>();
constructor() {
// Configure limits for each API
this.limiters.set('coingecko', {
type: 'timeWindow',
requests: 10,
window: 60000 // 1 minute
});
this.limiters.set('infura', {
type: 'daily',
requests: 100000
});
this.limiters.set('alchemy', {
type: 'perSecond',
rps: 300
});
}
async execute<T>(
api: string,
fn: () => Promise<T>
): Promise<T> {
const limiter = this.limiters.get(api);
if (!limiter) return fn();
switch (limiter.type) {
case 'perSecond':
return this.executeWithRPS(fn, limiter.rps);
case 'timeWindow':
await this.checkTimeWindow(api, limiter.requests, limiter.window);
return fn();
case 'daily':
await this.checkDaily(api, limiter.requests);
return fn();
default:
return fn();
}
}
}
// Usage in tools
export async function handleGetTokenPrice(args: any, client: any) {
const rateLimiter = new APIRateLimiter();
const price = await rateLimiter.execute('coingecko', async () => {
return await fetchTokenPrice(args.token);
});
return {
content: [{
type: 'text',
text: JSON.stringify({ price }, null, 2)
}]
};
}
The Clever Hacks
1. Request Coalescing
Multiple tools requesting same data? Deduplicate:
class RequestCoalescer {
async coalesce<T>(
key: string,
fn: () => Promise<T>,
ttl: number = 1000
): Promise<T> {
const cacheFile = `/tmp/mcp-cache/${key}.json`;
try {
const cached = JSON.parse(fs.readFileSync(cacheFile, 'utf8'));
if (Date.now() - cached.timestamp < ttl) {
return cached.data;
}
} catch {
// No cache
}
const result = await fn();
fs.writeFileSync(cacheFile, JSON.stringify({
timestamp: Date.now(),
data: result
}));
return result;
}
}
2. Adaptive Delays
Slow down when approaching limits:
function adaptiveDelay(used: number, limit: number): number {
const usage = used / limit;
if (usage < 0.5) return 0; // No delay
if (usage < 0.7) return 100; // Slight delay
if (usage < 0.9) return 500; // Moderate delay
return 2000; // Heavy delay
}
3. Request Priority
Some requests are more important:
async function prioritizedExecute(priority: 'high' | 'low', fn: Function) {
if (priority === 'low') {
// Low priority requests get additional delay
const delay = 500 + Math.random() * 1500;
await sleep(delay);
}
return fn();
}
The Gotchas We Hit
1. Clock Drift
Servers with different times break time-based limiting:
// Use UTC everywhere!
const now = new Date().toISOString(); // Not Date.now()
2. Filesystem Permissions
// Always check temp directory is writable
const tempDir = process.env.MCP_TEMP_DIR || '/tmp/mcp';
if (!fs.existsSync(tempDir)) {
fs.mkdirSync(tempDir, { recursive: true });
}
3. Cleanup
Temp files accumulate:
// Add cleanup on startup
function cleanupOldFiles() {
const files = fs.readdirSync('/tmp/mcp-rate-limits');
const yesterday = Date.now() - 86400000;
files.forEach(file => {
const stats = fs.statSync(path.join('/tmp/mcp-rate-limits', file));
if (stats.mtimeMs < yesterday) {
fs.unlinkSync(path.join('/tmp/mcp-rate-limits', file));
}
});
}
The Philosophy
Stateless doesn’t mean helpless.
We use:
- Time as state
- Filesystem as memory
- Math instead of counters
- Delays instead of queues
It’s not perfect, but it works. And it keeps MCP servers simple.
The Checklist
Implementing rate limiting in your MCP server:
- Identify API limits (requests/second, daily, etc.)
- Choose appropriate pattern (time-based, file-based, etc.)
- Implement retry logic with backoff
- Add request coalescing for identical calls
- Use filesystem for persistent counters
- Clean up old tracking files
- Test with concurrent requests
- Document limits in error messages
- Provide retry-after information
- Monitor actual API usage
References
- Rate limiter implementations:
/src/utils/rateLimiter.ts - Circuit breaker pattern:
/src/utils/circuitBreaker.ts - Request coalescing:
/src/utils/requestCache.ts - Test scenarios:
/tests/rate-limiting/
This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes constraints force creativity.
Related Reading
Prerequisites
- Context Window Management: Building AI-Friendly Code - Understanding the constraints of the AI environment is key to understanding why statelessness is a design goal.
Next Steps
- The MCP Inspector Deep Dive: Your Only Debugging Friend - Learn how to debug issues that arise from rate limiting.
Deep Dives
- Error Handling in MCP: Where Do Errors Actually Go? - See how to properly structure and return rate limit errors to the AI client.