back to posts
#19 Part 4 2025-08-18 18 min

Rate Limiting Without State: The MCP Paradox

How we implemented API rate limiting in a completely stateless protocol

Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Context Window Management | Next: The MCP Inspector Deep Dive

Rate Limiting Without State: The MCP Paradox

How we implemented API rate limiting in a completely stateless protocol

Date: August 18, 2025 Author: Myron Koch & Claude Code Category: Architecture Challenges

The Impossible Problem

MCP servers are stateless. Every request is independent. No memory between calls.

But blockchain APIs have rate limits:

How do you track request counts when you can’t remember anything?

Why Traditional Solutions Don’t Work

Can’t Use In-Memory Counters

// This doesn't work in MCP
let requestCount = 0;  // Resets every request!

async function handleRequest() {
  requestCount++;  // Always 1
  if (requestCount > 100) {  // Never triggers
    throw new Error('Rate limited');
  }
}

Can’t Use Redis/Database

// MCP servers should be zero-dependency
const redis = require('redis');  // ❌ External dependency
await redis.incr('api:requests');  // ❌ Stateful storage

Can’t Use Global State

// Each tool call is isolated
global.requestTimestamps = [];  // ❌ Doesn't persist
process.env.REQUEST_COUNT++;    // ❌ Not writable

The Breakthrough: Time-Based Bucketing

We can’t count requests, but we CAN use time:

export class StatelessRateLimiter {
  private readonly requestsPerSecond: number;
  private readonly minInterval: number;
  private lastCallTime = 0;
  private readonly mutex = new Promise<void>(resolve => resolve());

  constructor(requestsPerSecond: number) {
    this.requestsPerSecond = requestsPerSecond;
    this.minInterval = 1000 / requestsPerSecond;
  }

  async throttle<T>(fn: () => Promise<T>): Promise<T> {
    // Wait for previous call to release the lock
    await this.mutex;

    // Create new mutex for next caller
    let releaseLock: () => void;
    this.mutex = new Promise(resolve => { releaseLock = resolve; });

    try {
      const now = Date.now();
      const timeSinceLastCall = now - this.lastCallTime;

      // Enforce minimum interval between calls
      if (timeSinceLastCall < this.minInterval) {
        await this.sleep(this.minInterval - timeSinceLastCall);
      }

      this.lastCallTime = Date.now();
      return await fn();
    } finally {
      releaseLock!();
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

The File-Based Counter Pattern

For APIs with daily limits, we use the filesystem:

// The filesystem IS our state!
import fs from 'fs';
import path from 'path';

class FileBasedRateLimiter {
  private readonly limitDir = '/tmp/mcp-rate-limits';

  constructor() {
    // Ensure directory exists on initialization
    if (!fs.existsSync(this.limitDir)) {
      fs.mkdirSync(this.limitDir, { recursive: true });
    }
  }

  async checkLimit(api: string, limit: number): Promise<boolean> {
    const today = new Date().toISOString().split('T')[0];
    const countFile = path.join(this.limitDir, `${api}-${today}.count`);
    const lockFile = `${countFile}.lock`;

    // Acquire file lock to prevent race conditions
    while (fs.existsSync(lockFile)) {
      await new Promise(resolve => setTimeout(resolve, 10));
    }

    try {
      // Create lock file
      fs.writeFileSync(lockFile, Date.now().toString());

      // Read current count
      let count = 0;
      try {
        count = parseInt(fs.readFileSync(countFile, 'utf8'), 10);
      } catch {
        // File doesn't exist, first request today
      }

      if (count >= limit) {
        throw new Error(`Daily limit reached for ${api}: ${count}/${limit}`);
      }

      // Increment count atomically
      fs.writeFileSync(countFile, String(count + 1));

      return true;
    } finally {
      // Release lock
      fs.unlinkSync(lockFile);
    }
  }

  async cleanup(): Promise<void> {
    // Remove old count files
    const files = fs.readdirSync(this.limitDir);
    const today = new Date().toISOString().split('T')[0];

    files.forEach(file => {
      if (!file.includes(today)) {
        fs.unlinkSync(path.join(this.limitDir, file));
      }
    });
  }
}

The Circuit Breaker Pattern

When APIs fail, we back off WITHOUT remembering failures:

export class StatelessCircuitBreaker {
  async execute<T>(
    fn: () => Promise<T>,
    options: { maxRetries: number } = { maxRetries: 3 }
  ): Promise<T> {
    let lastError: Error | undefined;

    for (let attempt = 0; attempt < options.maxRetries; attempt++) {
      try {
        // Exponential backoff based on attempt number
        if (attempt > 0) {
          const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
          await this.sleep(delay);
        }

        return await fn();
      } catch (error: any) {
        lastError = error;

        // Check if error is rate limit
        if (this.isRateLimitError(error)) {
          // Extract retry-after header if available
          const retryAfter = this.getRetryAfter(error);
          if (retryAfter) {
            await this.sleep(retryAfter * 1000);
            continue;
          }
        }

        // If not rate limit or last attempt, throw
        if (attempt === options.maxRetries - 1) {
          throw error;
        }
      }
    }

    throw lastError || new Error('All attempts failed');
  }

  private isRateLimitError(error: any): boolean {
    return error.code === 429 ||
           error.message?.includes('rate limit') ||
           error.message?.includes('too many requests');
  }

  private getRetryAfter(error: any): number | null {
    // Check various places APIs put retry-after
    return error.retryAfter ||
           error.headers?.['retry-after'] ||
           error.response?.headers?.['retry-after'] ||
           null;
  }
}

The Time-Window Queue

For complex APIs with “X requests per Y minutes”:

class TimeWindowQueue {
  // Use timestamp encoding to track request windows
  private encodeWindow(timestamp: number, windowSize: number): string {
    const window = Math.floor(timestamp / windowSize);
    return `window:${window}`;
  }

  async canMakeRequest(
    api: string,
    limit: number,
    windowMs: number
  ): Promise<boolean> {
    const now = Date.now();
    const currentWindow = this.encodeWindow(now, windowMs);

    // Use filesystem to track requests in current window
    const windowFile = `/tmp/mcp-rl/${api}-${currentWindow}.json`;

    let timestamps: number[] = [];
    try {
      timestamps = JSON.parse(fs.readFileSync(windowFile, 'utf8'));
    } catch {
      // No file means new window
    }

    // Remove timestamps outside current window
    const windowStart = now - windowMs;
    timestamps = timestamps.filter(ts => ts > windowStart);

    if (timestamps.length >= limit) {
      // Calculate when next request can be made
      const oldestInWindow = Math.min(...timestamps);
      const nextAvailable = oldestInWindow + windowMs;
      const waitTime = nextAvailable - now;

      throw new Error(
        `Rate limit exceeded. Retry in ${Math.ceil(waitTime / 1000)}s`
      );
    }

    // Add current request
    timestamps.push(now);
    fs.writeFileSync(windowFile, JSON.stringify(timestamps));

    return true;
  }
}

The Distributed Rate Limiting Pattern

Multiple MCP server instances? Use lock files:

class DistributedRateLimiter {
  private async acquireLock(resource: string): Promise<() => void> {
    const lockFile = `/tmp/mcp-locks/${resource}.lock`;
    const lockId = Math.random().toString(36);

    // Spin until we get the lock
    while (true) {
      try {
        // Atomic file creation
        fs.writeFileSync(lockFile, lockId, { flag: 'wx' });

        // Return unlock function
        return () => {
          try {
            const current = fs.readFileSync(lockFile, 'utf8');
            if (current === lockId) {
              fs.unlinkSync(lockFile);
            }
          } catch {
            // Lock already released
          }
        };
      } catch {
        // Lock exists, wait and retry
        await this.sleep(10 + Math.random() * 40);
      }
    }
  }

  async executeWithLock<T>(
    resource: string,
    fn: () => Promise<T>
  ): Promise<T> {
    const unlock = await this.acquireLock(resource);
    try {
      return await fn();
    } finally {
      unlock();
    }
  }
}

Real-World Implementation

Here’s how we use it in our servers:

// src/utils/rateLimiter.ts
export class APIRateLimiter {
  private limiters = new Map<string, any>();

  constructor() {
    // Configure limits for each API
    this.limiters.set('coingecko', {
      type: 'timeWindow',
      requests: 10,
      window: 60000  // 1 minute
    });

    this.limiters.set('infura', {
      type: 'daily',
      requests: 100000
    });

    this.limiters.set('alchemy', {
      type: 'perSecond',
      rps: 300
    });
  }

  async execute<T>(
    api: string,
    fn: () => Promise<T>
  ): Promise<T> {
    const limiter = this.limiters.get(api);
    if (!limiter) return fn();

    switch (limiter.type) {
      case 'perSecond':
        return this.executeWithRPS(fn, limiter.rps);

      case 'timeWindow':
        await this.checkTimeWindow(api, limiter.requests, limiter.window);
        return fn();

      case 'daily':
        await this.checkDaily(api, limiter.requests);
        return fn();

      default:
        return fn();
    }
  }
}

// Usage in tools
export async function handleGetTokenPrice(args: any, client: any) {
  const rateLimiter = new APIRateLimiter();

  const price = await rateLimiter.execute('coingecko', async () => {
    return await fetchTokenPrice(args.token);
  });

  return {
    content: [{
      type: 'text',
      text: JSON.stringify({ price }, null, 2)
    }]
  };
}

The Clever Hacks

1. Request Coalescing

Multiple tools requesting same data? Deduplicate:

class RequestCoalescer {
  async coalesce<T>(
    key: string,
    fn: () => Promise<T>,
    ttl: number = 1000
  ): Promise<T> {
    const cacheFile = `/tmp/mcp-cache/${key}.json`;

    try {
      const cached = JSON.parse(fs.readFileSync(cacheFile, 'utf8'));
      if (Date.now() - cached.timestamp < ttl) {
        return cached.data;
      }
    } catch {
      // No cache
    }

    const result = await fn();
    fs.writeFileSync(cacheFile, JSON.stringify({
      timestamp: Date.now(),
      data: result
    }));

    return result;
  }
}

2. Adaptive Delays

Slow down when approaching limits:

function adaptiveDelay(used: number, limit: number): number {
  const usage = used / limit;

  if (usage < 0.5) return 0;        // No delay
  if (usage < 0.7) return 100;      // Slight delay
  if (usage < 0.9) return 500;      // Moderate delay
  return 2000;                      // Heavy delay
}

3. Request Priority

Some requests are more important:

async function prioritizedExecute(priority: 'high' | 'low', fn: Function) {
  if (priority === 'low') {
    // Low priority requests get additional delay
    const delay = 500 + Math.random() * 1500;
    await sleep(delay);
  }

  return fn();
}

The Gotchas We Hit

1. Clock Drift

Servers with different times break time-based limiting:

// Use UTC everywhere!
const now = new Date().toISOString();  // Not Date.now()

2. Filesystem Permissions

// Always check temp directory is writable
const tempDir = process.env.MCP_TEMP_DIR || '/tmp/mcp';
if (!fs.existsSync(tempDir)) {
  fs.mkdirSync(tempDir, { recursive: true });
}

3. Cleanup

Temp files accumulate:

// Add cleanup on startup
function cleanupOldFiles() {
  const files = fs.readdirSync('/tmp/mcp-rate-limits');
  const yesterday = Date.now() - 86400000;

  files.forEach(file => {
    const stats = fs.statSync(path.join('/tmp/mcp-rate-limits', file));
    if (stats.mtimeMs < yesterday) {
      fs.unlinkSync(path.join('/tmp/mcp-rate-limits', file));
    }
  });
}

The Philosophy

Stateless doesn’t mean helpless.

We use:

It’s not perfect, but it works. And it keeps MCP servers simple.

The Checklist

Implementing rate limiting in your MCP server:

References


This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes constraints force creativity.


Prerequisites

Next Steps

Deep Dives