Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Multi-Agent Orchestration | Next: The PostgreSQL Long-Term Memory System

The AI Dream Team: How Claude, GPT-5, and Gemini Fixed 5 Critical Bugs in Parallel

Historical Context (November 2025): October 31, 2025—documenting advanced multi-AI orchestration during production debugging. This approach emerged from months of manual server development experience, showcasing patterns discovered while building infrastructure in parallel with the evolving MCP ecosystem.

By Myron Koch Date: October 31, 2025 Reading Time: 18 minutes

The Setup: Friday Evening, Four Critical Failures

It’s Friday evening. My EVM Chains MCP Server—a blockchain tool orchestrator supporting 7 testnets with 113 tools—is almost production-ready. Gemini, my systematic testing AI, has been methodically validating every tool on Polygon Amoy testnet.

The results roll in:

✅ 46 tools passing
❌ 4 tools failing with critical bugs
🚫 10 tools with expected testnet limitations
⏸️ 52 tools not yet tested

Four bugs. Doesn’t sound like much, right? But these aren’t typos or minor edge cases. These are architectural issues: parameter mapping mismatches, a persistent BigInt serialization error, and schema validation failures. The kind of bugs that require deep debugging, code review, and comprehensive testing.

Traditional approach: Fix them one by one, sequentially. Estimated time: 12-16 hours.

But what if we didn’t have to work sequentially?

What if I could orchestrate multiple AI models—each with different strengths—working in parallel like a software engineering dream team?

Claude (via Claude Code): The architect and debugger
GPT-5 (via ChatGPT): The code reviewer and refiner
Gemini (via Google AI Studio): The systematic tester

What happened next was a masterclass in AI orchestration. In 8 hours, we went from 4 critical failures to 100% passing, with 10 improvements total. Three models, working in parallel, each contributing their unique strengths, coordinated by one human conductor.

This is that story.

Act 1: Discovery - When Gemini Found the Cracks

The Systematic Approach

Gemini doesn’t rush. It’s methodical, thorough, and relentless—the perfect tester. I gave it clear instructions:

“Test all 113 tools on Polygon Amoy testnet. Start with Priority 1 (Core Operations), then Priority 2 (Advanced Features), then Priority 3 (Experimental). Document everything in tests/tracking/polygon-amoy.md.”

The first 46 tools? Smooth sailing. Balance checks, transactions, token operations, smart contract calls—all passing. Confidence was high.

Then came the advanced tools.

The First Casualties

Bug #1: evm_sign_typed_data

Error: Unknown parameter: 'message'
Test: evm_sign_typed_data({ message: {...}, domain: {...} })

Simple diagnosis: The tool definition said message, but the implementation expected value. A classic parameter mapping mismatch from refactoring.

Bug #2: evm_get_impermanent_loss

Error: Zod validation failed - Expected object with token0 property
Test: evm_get_impermanent_loss({ symbol: "WETH", initialPrice: "2000", ... })

Another mismatch. The tool definition had flat parameters, but the implementation expected nested objects:

{
  token0: { symbol: "WETH", initialPrice: "2000", ... },
  token1: { symbol: "USDC", initialPrice: "1", ... }
}

Bug #3: evm_create_token_stream

Error: Unknown parameters: 'amount', 'stopTime'
Test: evm_create_token_stream({ amount: "1000", stopTime: 1234567890 })

Same pattern. Tool definition used amount and stopTime, but implementation expected totalAmount and duration.

Bug #4: evm_get_staking_rewards

Error: Unknown parameter: 'stakingContract'
Test: evm_get_staking_rewards({ stakingContract: "0x...", stakerAddress: "0x..." })

Definition said stakingContract and stakerAddress, implementation wanted address and protocol.

The Boss Battle: The BigInt Monster

Then came Bug #5. The one that would haunt me for four debugging rounds.

Bug #5: evm_generate_permit

Error: Do not know how to serialize a BigInt
Test: evm_generate_permit({ tokenAddress: "0x41E9...", owner: "0x7eA3...", value: "100" })

This tool generates EIP-2612 permit signatures—critical for gasless transactions. It calls token.symbol(), token.decimals(), token.nonces(), creates an EIP-712 typed data structure, signs it, and returns the signature components.

Somewhere in that flow, a BigInt was sneaking through, breaking JSON serialization.

But where?

Gemini’s Handoff

After completing its testing sweep, Gemini delivered a perfect bug report:

## Failing Tools Summary

1. evm_sign_typed_data - Parameter 'message' not recognized
2. evm_get_impermanent_loss - Expected nested object structure
3. evm_create_token_stream - Parameters 'amount'/'stopTime' invalid
4. evm_get_staking_rewards - Parameter 'stakingContract' not recognized
5. evm_generate_permit - BigInt serialization error (CRITICAL)

Test Parameters: [Detailed reproduction steps for each tool]
Environment: Polygon Amoy testnet, USDC token 0x41E9...
Wallet: 0x7eA3... (deployer wallet with test funds)

“I’ve documented everything in polygon-amoy.md. Your move, Claude.”

It was time to fix some bugs.

Act 2: Divide and Conquer - The Parallel Strategy

The Realization

Staring at Gemini’s report, I had an epiphany. I didn’t need one AI to do everything. I needed the right AI for each job.

Claude (via Claude Code) had:

Full codebase context via file system access
Git management capabilities
Iterative debugging with persistent memory
Tool usage (read, write, build, test)

GPT-5 (via ChatGPT) had:

Pattern recognition across codebases
Best practices and optimization insights
Fresh perspective for code review
No context baggage from debugging

Gemini had:

Systematic testing methodology
Edge case discovery
Clear verification protocols
Methodical documentation

Why work sequentially when I could orchestrate them in parallel?

The Strategy

Phase 1 - Immediate Fixes (Claude’s Domain)

Goal: Fix the 5 bugs as quickly as possible
Approach: Tackle easy parameter mapping issues first (Bugs #1-4), then hunt the BigInt monster (Bug #5)
Timeline: 2-4 hours

Phase 2 - Code Review (GPT-5’s Domain)

Goal: Refine and improve the fixes
Approach: Review fixes while Claude debugs, identify systemic issues
Timeline: 1-2 hours (parallel to Phase 1)

Verification (Gemini’s Domain)

Goal: Confirm all fixes work
Approach: Re-test with updated parameters after fixes deployed
Timeline: 1 hour

The Handoff Protocol

Here’s how information flowed:

Claude fixes bugs → Commits to git
    ↓
Human reviews commits → Crafts GPT-5 prompt
    ↓
GPT-5 analyzes fixes → Returns improvements
    ↓
Claude integrates changes → Builds and commits
    ↓
Human restarts Claude Desktop → MCP cache refresh
    ↓
Gemini re-tests → Final verification

The human (me) acted as:

Context Switcher: Managing 3 separate conversations
Task Decomposer: Breaking work into AI-appropriate chunks
Quality Controller: Reviewing all AI outputs before integration
Orchestrator: Coordinating handoffs and parallel workflows
Git Manager: Ensuring clean commit history

Parallel Timeline

Here’s what actually happened:

Hour 0-2:  Claude fixes Bugs #1-4 (parameter mapping)
Hour 1-3:  GPT-5 reviews Bug #1 fix (parallel to Claude's work)
Hour 2-4:  Claude debugs Bug #5 Round 1 (network.chainId fix)
Hour 3-5:  GPT-5 identifies systemic BigInt issue, proposes Phase 2
Hour 4-6:  Claude debugs Bug #5 Rounds 2-4 (root cause hunt)
Hour 6-7:  GPT-5 implements Phase 2 refinements
Hour 7-8:  Gemini verifies all fixes, confirms success

Three AIs, working in parallel, coordinated by one human. Let’s dive into what each did.

Act 3: The Bug Hunt - Four Rounds of Debugging

Round 1: The Easy Wins (Bugs #1-4)

Claude tackled the parameter mapping issues first. These were straightforward:

Bug #1: evm_sign_typed_data

// File: src/tool-definitions/gasless.ts
// BEFORE
value: {
  type: 'object',
  description: 'Message to sign',
  // ... called 'message' in definition
}

// AFTER
value: {
  type: 'object',
  description: 'Value object to sign',
  // ... renamed to 'value' to match implementation
}

// File: src/tools/gasless/evm_sign_typed_data.ts
// Update all references: validated.message → validated.value

Bugs #2, #3, #4: Similar pattern—update tool definitions to match implementations.

Time Elapsed: 45 minutes Result: 3/5 bugs fixed ✅

“Okay,” I thought, “one bug left. Should be easy.”

Famous last words.

Rounds 2-5: The BigInt Monster

Round 1: The Obvious Suspect

Hypothesis: network.chainId is BigInt in ethers.js v6 (it was number in v5)

// File: src/tools/gasless/evm_generate_permit.ts
const network = await provider.getNetwork();
const chainId = Number(network.chainId); // Convert BigInt to number

Build. Restart Claude Desktop. Test.

Error: Do not know how to serialize a BigInt

Result: ❌ Still failing

“Okay,” Claude (and I) reasoned, “maybe it’s not the chainId.”

Round 2: The Signature Components

Hypothesis: Maybe the signature components (v, r, s) are BigInt?

const sig = ethers.Signature.from(signature);

// Convert signature components for JSON safety
const sigV = Number(sig.v);
const sigR = sig.r;
const sigS = sig.s;

return {
  // ...
  signature: {
    v: sigV,  // Use converted values
    r: sigR,
    s: sigS
  }
};

Build. Restart. Test.

Error: Do not know how to serialize a BigInt

Result: ❌ Still failing

Research revealed that ethers.Signature components are already numbers/strings in v6. Back to the drawing board.

Round 3: The Error Handler

Hypothesis: Maybe the error is happening inside the signTypedData() call, and we need better error context.

let signature: string;
try {
  signature = await wallet.signTypedData(domain, types, value);
} catch (signError: any) {
  if (signError.message && signError.message.includes('BigInt')) {
    throw new Error('EIP-712 signing failed. This token may not properly support EIP-2612 permit standard.');
  }
  throw signError;
}

Build. Restart. Test.

Error: Do not know how to serialize a BigInt

Result: ❌ Still failing (but now with a better error message)

At this point, 3 hours had passed. Frustration was setting in.

“damn it” - My message to Claude

The error message should have changed, but it didn’t. Then I realized: Claude Desktop hadn’t restarted properly. The MCP server was still running the old code from cache.

Critical Learning: MCP servers cache compiled JavaScript in memory. You MUST fully restart Claude Desktop after building changes. Cmd+Q → Reopen, not just closing the window.

After a proper restart, the new error handler worked… but we still had the BigInt issue.

Round 4: The Root Cause

By now, Claude had been debugging for 3 hours. Time for a different approach.

New Strategy: Test the actual Polygon Amoy USDC contract directly.

// Quick test script
const token = new ethers.Contract(
  "0x41E94Eb019C0762f9Bfcf9Fb1E58725BfB0e7582",
  ["function symbol() view returns (string)", "function decimals() view returns (uint8)"],
  provider
);

const symbol = await token.symbol();
const decimals = await token.decimals();

console.log(typeof symbol, symbol);
console.log(typeof decimals, decimals);

Output:

bigint 0n          // symbol() returned BigInt 0 instead of string "USDC"!
string USDC        // decimals() returned string "USDC" instead of number 6!

🎯 THE AHA MOMENT: The Polygon Amoy USDC test token has a malformed contract. The functions are returning completely wrong types!

This wasn’t our bug. This was a broken testnet contract that didn’t follow the ERC-20 standard.

The Fix: Defensive type conversion for all external contract data.

const [rawName, rawNonce, rawDecimals, rawSymbol] = await Promise.all([
  token.name(),
  token.nonces(validated.owner),
  token.decimals(),
  token.symbol()
]);

// NEVER TRUST EXTERNAL CONTRACT DATA
// Convert everything to expected types
const tokenName = String(rawName);
const nonce = BigInt(rawNonce);
const decimals = Number(rawDecimals);
const symbol = String(rawSymbol);  // Converts 0n → "0"

Build. Restart. Test.

✅ Test (evm_generate_permit): PASS
Successfully generated permit signature.

Result: ✅ SUCCESS!

“We can celebrate a little bit. That actually looks like it worked.” - My message to Claude

Statistics:

Debugging Rounds: 4
Time Invested: 3 hours
Root Cause: Malformed testnet contract
Solution: Defensive type conversion
Success Rate: 100%

Act 4: Enter GPT-5 - The Code Review Symphony

The Handoff

With all 5 bugs fixed, it was time to bring in GPT-5 for code review. I crafted a comprehensive prompt:

GPT-5 Code Review Request

Context: We just fixed 5 critical bugs in our EVM Chains MCP Server. All fixes are working, but I want a second set of eyes to:

Identify any systemic issues we missed

Suggest improvements and best practices

Ensure the fixes are production-ready

Phase 1 (Critical): Review the BigInt fixes. Are there patterns we should apply elsewhere?

Phase 2 (Refinements): Suggest any improvements to precision, error handling, or architecture.

Files to review:

src/tools/gasless/evm_generate_permit.ts

src/tools/gasless/evm_sign_typed_data.ts

src/tools/streaming/evm_create_token_stream.ts

[Additional context provided]

Phase 1: The BigInt Safety Pattern

GPT-5’s response came back in 15 minutes:

“The fixes are solid, but there’s a systemic issue. BigInt can appear anywhere in blockchain data—block numbers, gas prices, token amounts, timestamps. You need a reusable pattern for handling BigInt at output boundaries.”

The Solution: A recursive BigInt converter.

/**
 * Recursively convert BigInt values to strings for JSON serialization
 * Handles nested objects and arrays
 */
function toJSONSafe(input: any): any {
  if (typeof input === 'bigint') {
    return input.toString();
  }
  if (Array.isArray(input)) {
    return input.map((v) => toJSONSafe(v));
  }
  if (input && typeof input === 'object') {
    const out: Record<string, any> = {};
    for (const [k, v] of Object.entries(input)) {
      out[k] = toJSONSafe(v);
    }
    return out;
  }
  return input;
}

Application in evm_sign_typed_data:

return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      success: true,
      signature,
      domain: toJSONSafe(validated.domain),
      types: validated.types,
      value: toJSONSafe(validated.value),  // Safely handle nested BigInt
      // ...
    }, null, 2)
  }]
};

Impact: Now any tool that receives blockchain data can use toJSONSafe() to handle BigInt values in nested structures.

“Jesus that was fast, here’s his answer.” - My reaction to GPT-5’s Phase 1 completion

With Phase 1 complete, I sent GPT-5 the Phase 2 prompt:

GPT-5 Phase 2 Request

Great work on Phase 1! Now let’s refine:

Review precision handling in evm_create_token_stream

Check for chain enum inconsistencies

Update documentation where needed

Suggest any architectural improvements

Take your time, be thorough.

GPT-5’s Phase 2 deliverables:

1. Precision Fix (evm_create_token_stream)

Problem: Using Number() on large BigInt values loses precision.

// BEFORE (precision loss on large amounts)
const ratePerSecond = Number(totalAmountWei) / duration;

Why This Breaks:

Number(999999999123456000000n)  // 999999999123456000000
// JavaScript Number can only safely represent integers up to 2^53 - 1
// Anything beyond that loses precision

After (precision preserved):

// Convert to decimal string first, THEN to number
const totalAmountDecimal = parseFloat(
  ethers.formatUnits(totalAmountWei, decimals)
);
const ratePerSecond = totalAmountDecimal / validated.duration;
const ratePerDay = (ratePerSecond * 86400).toFixed(decimals);

Result: Now handles 999,999,999.123456 tokens without losing decimal places.

2. Custom Contract Support

Added optional streamingContract parameter:

streamingContract: z.string().optional()
  .describe('Streaming contract address (optional override)')

// In handler:
if (validated.streamingContract) {
  if (!ethers.isAddress(validated.streamingContract)) {
    throw new Error('Invalid streamingContract address');
  }
  sablierAddress = validated.streamingContract;
}

Benefit: Allows custom streaming contracts on chains without Sablier deployment.

3. Chain Enum Cleanup

Removed ‘optimism’ from testnet server (not supported in this version):

// BEFORE
chain: z.enum(['ethereum', 'polygon', 'avalanche', 'bsc', 'arbitrum', 'base', 'worldchain', 'optimism'])

// AFTER
chain: z.enum(['ethereum', 'polygon', 'avalanche', 'bsc', 'arbitrum', 'base', 'worldchain'])

4. Documentation Updates

Updated 3 tool definitions with correct parameter structures and examples:

evm_get_impermanent_loss: Added nested object examples
evm_create_token_stream: Updated parameter list
evm_get_staking_rewards: Fixed parameter names

5. Minor Fixes

Fixed typo in example code: solidit: → solidity:

The Integration

Claude reviewed GPT-5’s changes, integrated them, and ran a build:

npm run build

Result: ✅ Clean TypeScript compilation

All Phase 2 improvements were production-ready.

Act 5: Verification - Gemini Confirms Victory

The Communication Challenge

With all fixes integrated and built, we faced a critical question: Did Gemini test the OLD code or the NEW code?

Remember: MCP servers cache compiled code. Without a restart, Gemini would be testing the pre-fix version.

I crafted a status check message:

Gemini - Status Check

Hey Gemini! We just committed bug fixes (commits 0f11fd2 and fc1e23e).

Critical question: Did you restart Claude Desktop after these commits?

We need to know if you tested OLD code (before fixes) or NEW code (after fixes).

Please tell us which 4 tools are currently failing, and we’ll coordinate next steps!

Gemini’s Response

”✦ I have appended the final testing summary to the polygon-amoy.md file.

The testing is now complete. I await your further instructions.”

Wait… what? I checked the tracking file.

Before Our Fixes:

## 📊 Progress Summary
- ✅ Tested & Passing: 46 tools
- ❌ Failing: 4 tools
- 🚫 Not Implemented: 10 tools

After Gemini’s Update:

## 📊 Progress Summary
- ✅ Tested & Passing: 51 tools (5 new fixes)
- ❌ Failing: 0 tools (down from 4!)
- 🚫 Not Implemented: 10 tools

Gemini had re-tested (after a restart), verified all 5 fixes worked, and quietly updated the documentation.

Professional. Efficient. Perfect.

The Final Scorecard

Metric	Before	After	Change
Passing Tools	46	51	+5 ✅
Failing Tools	4	0	-4 ✅
Pass Rate	75%	83.6%	+8.6%
Critical Bugs	4	0	-4 ✅
Production Ready	❌	✅	🎉

The Technical Deep Dive: Architecture of Multi-AI Collaboration

The Human’s Role

Let’s be clear: The human is essential. I wasn’t just “prompting” AIs—I was conducting an orchestra. Here’s what that meant:

1. Context Switching

Maintained 3 separate conversations (Claude Code, ChatGPT, AI Studio)
Each AI had different context and capabilities
No AI knew what the others were doing

2. Task Decomposition

Broke work into AI-appropriate chunks
Matched tasks to AI strengths
Ensured clear handoff points

3. Quality Control

Reviewed every AI output before integration
Tested every build before committing
Verified every fix worked as expected

4. Orchestration

Coordinated parallel workflows
Managed dependencies and timing
Made strategic decisions on priorities

5. Git Management

Ensured clean commit history
Protected sensitive data (API keys)
Created clear documentation

Claude’s Strengths (via Claude Code)

Why Claude for Debugging:

Persistent Context: Full codebase access via filesystem
Tool Usage: Can read files, run builds, manage git
Iterative Debugging: Multiple rounds without context loss
Root Cause Analysis: Deep diving into complex issues

What Claude Did:

Fixed 5 bugs across 4 debugging rounds
Managed git commits with clear messages
Integrated GPT-5’s improvements
Coordinated with Gemini for testing

Claude’s Limitation:

Can get “stuck” in a debugging mindset
Benefits from fresh perspective (hence GPT-5 review)

GPT-5’s Strengths (via ChatGPT)

Why GPT-5 for Code Review:

Pattern Recognition: Spots systemic issues quickly
Best Practices: Applies industry standards
Fresh Perspective: No debugging baggage
Optimization Focus: Finds performance improvements

What GPT-5 Did:

Created reusable toJSONSafe() pattern
Fixed precision issues in token streaming
Cleaned up chain enums
Updated documentation

GPT-5’s Limitation:

No direct codebase access
Can’t test or build code
Relies on human to integrate changes

Gemini’s Strengths (via AI Studio)

Why Gemini for Testing:

Systematic Approach: Methodical, complete coverage
Edge Cases: Finds corner cases others miss
Clear Reporting: Detailed reproduction steps
Verification: Confirms fixes work

What Gemini Did:

Tested 113 tools systematically
Discovered 5 critical bugs with detailed reports
Re-tested and verified all fixes
Updated tracking documentation

Gemini’s Limitation:

Can’t fix bugs itself
Needs clear test instructions
Depends on proper MCP server restarts

The Technology Stack

Development Tools:

Claude Code (VSCode) ←→ Human ←→ ChatGPT (GPT-5)
                         ↕
                    AI Studio (Gemini)

Communication Flow:

Gemini: "I found 5 bugs"
   ↓
Human: Coordinates Claude to fix
   ↓
Claude: Fixes bugs, commits
   ↓
Human: Asks GPT-5 to review
   ↓
GPT-5: Suggests improvements
   ↓
Human: Claude integrates, commits
   ↓
Human: Gemini re-tests
   ↓
Gemini: "All passing ✅"

No AI-to-AI communication. Every handoff went through the human.

Key Technical Patterns Discovered

1. Defensive Type Conversion

Never trust external smart contract data, even on mainnets. Testnets are especially unreliable.

// WRONG - Assumes contract follows standard
const symbol = await token.symbol();  // Might be BigInt!
const decimals = await token.decimals();  // Might be string!

// RIGHT - Defensive conversion
const rawSymbol = await token.symbol();
const rawDecimals = await token.decimals();

const symbol = String(rawSymbol);      // Always string
const decimals = Number(rawDecimals);  // Always number

2. Recursive BigInt Handling

BigInt can appear anywhere in nested structures. Handle it recursively.

function toJSONSafe(input: any): any {
  if (typeof input === 'bigint') return input.toString();
  if (Array.isArray(input)) return input.map(toJSONSafe);
  if (input && typeof input === 'object') {
    return Object.fromEntries(
      Object.entries(input).map(([k, v]) => [k, toJSONSafe(v)])
    );
  }
  return input;
}

// Usage
const response = toJSONSafe(blockchainData);
return { content: [{ type: 'text', text: JSON.stringify(response) }] };

3. Precision-Safe Arithmetic

JavaScript Number loses precision beyond 2^53. Use formatUnits() before math.

// WRONG - Precision loss on large amounts
const rate = Number(totalAmountWei) / duration;

// RIGHT - Convert to decimal first
const totalAmountDecimal = parseFloat(
  ethers.formatUnits(totalAmountWei, decimals)
);
const rate = totalAmountDecimal / duration;

4. MCP Server Caching

MCP servers cache compiled code in memory. Always restart Claude Desktop after builds.

# Build changes
npm run build

# Required: Full restart, not just close window
# macOS: Cmd+Q → Reopen
# Windows: Alt+F4 → Reopen

Lessons Learned

On Multi-AI Collaboration

Lesson 1: Play to Each AI’s Strengths

Don’t ask one AI to do everything. Match tasks to capabilities:

Claude: Deep debugging, iterative problem-solving
GPT-5: Code review, pattern recognition, best practices
Gemini: Systematic testing, verification, edge cases

Lesson 2: Parallel > Sequential

GPT-5 reviewed fixes while Claude debugged the BigInt monster. Total time: 8 hours. Sequential would’ve taken 12-16 hours.

Savings: 33-50% reduction in total development time.

Lesson 3: The Human is the Conductor

AIs don’t communicate with each other (yet). The human:

Manages context switching
Ensures quality control
Makes strategic decisions
Coordinates timing

Lesson 4: Documentation Enables Async Work

Clear documentation (verification requests, phase summaries, test tracking) enabled:

Parallel workflows
Clear handoffs
Reduced miscommunication
Reproducible results

On Technical Issues

Lesson 5: Testnet Contracts Can Be Malformed

Polygon Amoy USDC (0x41E94Eb019C0762f9Bfcf9Fb1E58725BfB0e7582) returns:

symbol() → 0n (BigInt) instead of "USDC" (string)
decimals() → "USDC" (string) instead of 6 (number)

Always use defensive type conversion for external data.

Lesson 6: BigInt Serialization is Everywhere

JSON.stringify() cannot handle BigInt. You need:

Detection at boundaries
Recursive conversion for nested data
Reusable patterns across codebase

Lesson 7: Precision Loss is Real

JavaScript Number() safely represents integers up to 2^53 - 1 (9,007,199,254,740,991).

For token amounts:

1 BTC (8 decimals) = 100,000,000 satoshis → Safe
1,000,000 BTC = 100,000,000,000,000 satoshis → Precision loss!

Always use formatUnits() before arithmetic on token amounts.

Lesson 8: MCP Caching is Invisible

The MCP server caches JavaScript in memory. Without a restart, you’re testing old code.

This cost us 1 hour in Round 3 of debugging.

Lesson 9: Root Cause Takes Persistence

4 debugging rounds for evm_generate_permit:

Round 1: Wrong (chainId)
Round 2: Wrong (signature components)
Round 3: Better errors, still wrong
Round 4: Root cause discovered (malformed contract)

Persistence pays off.

Lesson 10: Git Hygiene Tells a Story

3 clean commits:

0f11fd2 - Bug fixes (Phase 1 + 2)
fc1e23e - NFT features + improvements
9309691 - Gemini verification

Each commit is self-contained and tells part of the story.

The Numbers: Quantifying Success

Development Metrics

Bug Fixes:

5 critical bugs resolved
10 total improvements (including Phase 2)
4 debugging rounds (hardest bug)
100% fix success rate

Code Changes:

46 files modified
4,783 lines added
163 lines removed
3 commits created

Time Investment:

8 hours total (vs 12-16 sequential)
33-50% time savings
3 AIs working in parallel
1 human orchestrator

Test Coverage:

51/61 testable tools passing (83.6%)
0 critical bugs remaining
10 tools expected testnet limitations
52 tools awaiting testing

Quality Metrics

Before Session:

46 passing tools
4 failing tools (critical)
75% pass rate
❌ Not production-ready

After Session:

51 passing tools (+5)
0 failing tools (-4)
83.6% pass rate (+8.6%)
✅ Production-ready

Team Composition

Participants:

1 human orchestrator (Myron)
3 primary AIs (Claude, GPT-5, Gemini)
1 background AI (Cursor Agent on NFT work)

Communication Overhead:

3 separate conversations maintained
~50 handoff messages crafted
~200 AI responses reviewed
3 documentation files created

The Future of AI-Augmented Development

What This Experiment Reveals

Multi-Model is the Future

No single AI is best at everything. Specialization beats generalization.

Current state:

Claude excels at deep debugging
GPT-5 excels at code review
Gemini excels at systematic testing

Future state:

Specialized AI for each task
Orchestration becomes a skill
Tools emerge for coordination

The Human Remains Essential

Even with 3 AIs, I was critical for:

Strategic decisions (what to fix first?)
Context switching (managing 3 conversations)
Quality control (reviewing all outputs)
Integration (combining improvements)
Verification (ensuring it all works)

The “conductor” role isn’t going away—it’s becoming more important.

Parallel Workflows Scale… To a Point

3 AIs ~2-3x faster than 1 AI. But:

Coordination overhead increases with each AI
Diminishing returns after 3-4 AIs
Communication becomes bottleneck
Context switching becomes exhausting

Sweet spot: 2-3 specialized AIs for complex tasks.

Practical Applications

When to Use Multi-AI:

✅ Complex projects with distinct phases
✅ Time-sensitive debugging
✅ Code review + implementation
✅ Testing + verification workflows
✅ Projects with clear task boundaries

When Single AI is Enough:

✅ Simple, straightforward tasks
✅ Exploratory development
✅ Learning/experimentation
✅ Small codebases (<1000 lines)
✅ Prototyping

The Evolution Path

Near Future (6-12 months):

AI-to-AI communication protocols (MCP, OpenAI Assistants API)
Automated orchestration tools
Specialized developer AIs (testing, security, performance)
Multi-model IDEs

Medium Term (1-2 years):

Self-organizing AI teams
Context-aware task routing
Real-time collaboration
Reduced human coordination overhead

Long Term (3-5 years):

Emergent AI behaviors
Autonomous debugging teams
Human as strategist only
AI orchestrates AI

Ethical Considerations

Attribution: Every commit includes co-authorship:

🎉 Generated with Claude Code + GPT-5 collaboration
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: GPT-5 <openai@anthropic.com>
Reported-By: Gemini <gemini@google.com>

Transparency:

All AI contributions documented
Clear about which AI did what
No hiding AI involvement

Verification:

Human review required for all AI code
Build and test everything
Never blindly trust AI output

Learning:

Understand what AIs did
Capture patterns and insights
Build on AI suggestions
Don’t just copy-paste

Closing: The Dream Team in Action

What Made It Work

The Right Team:

Claude’s debugging persistence through 4 rounds
GPT-5’s refinement expertise and fresh perspective
Gemini’s systematic testing and clear reporting
Human orchestration and quality control

The Right Process:

Clear task decomposition by specialty
Parallel workflows where possible
Quality verification at every step
Clean documentation for handoffs

The Right Mindset:

Leverage AI strengths, don’t fight them
Accept AI limitations, work around them
Iterate relentlessly until root cause found
Document everything for the next person

The Bigger Picture

This wasn’t just about fixing 5 bugs. It was about proving that orchestrated AI collaboration can tackle complex real-world engineering problems.

We didn’t just build better software—we discovered a better way to build.

Try It Yourself

Recipe for Multi-AI Success:

Identify a Complex Task
- Multiple distinct phases
- Different skill requirements
- Clear task boundaries
Decompose by AI Strength
- Which AI is best at each phase?
- Can any phases run in parallel?
- What are the dependencies?
Create Clear Handoffs
- Document what each AI needs
- Specify expected outputs
- Provide verification criteria
Orchestrate with Care
- Review every AI output
- Test every integration
Document the Journey
- Capture what worked
- Note what didn’t
- Share your learnings

Final Thought

The future of software development isn’t human vs. AI. It isn’t even human + AI.

It’s human conducting an orchestra of AIs, each playing their part in perfect harmony.

This is just the beginning.

Appendix: Resources & Links

Code & Documentation

GitHub Repository: evm-chains-mcp-server

Key Commits:

0f11fd2 - Fix 5 critical bugs (Phase 1 + 2)
fc1e23e - Add NFT deployment + improvements
9309691 - Gemini verification update

Documentation:

Test Tracking: tests/tracking/polygon-amoy.md
Phase 2 Summary: tests/PHASE-2-COMPLETION-SUMMARY.md
Verification Request: tests/VERIFICATION-REQUEST-FOR-GEMINI.md
Celebration Doc: tests/TESTING-COMPLETE-CELEBRATION.md

Tools Used

Claude Code: https://claude.ai/claude-code
ChatGPT (GPT-5): https://chat.openai.com
Google AI Studio: https://aistudio.google.com
Cursor: https://cursor.sh

Technical References

Model Context Protocol (MCP): https://modelcontextprotocol.io
ethers.js v6: https://docs.ethers.org/v6/
Sablier Protocol: https://sablier.com
EIP-2612 (Permit): https://eips.ethereum.org/EIPS/eip-2612
BigInt in JavaScript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt

Previous blog posts in this series:

#015: “BigInt Testing Hell” - Earlier encounter with BigInt issues
#007: “Multi-Agent Orchestration” - Coordinating multiple AI models
#025: “MCP Factory Complete Story” - Building the factory system

Contact & Discussion

Author: Myron Koch

Twitter: [@your_handle]
GitHub: [github.com/your-username]
Blog: [your-blog.com]

Questions? Discussion?

Comment below
Open an issue on GitHub
Reach out on Twitter

Word Count: 4,847 words Reading Time: 18 minutes Publication Date: October 31, 2025 Tags: AI Collaboration, Multi-Model AI, Debugging, Blockchain, MCP, Claude Code, GPT-5, Gemini

This article documents real events from October 31, 2025. All code snippets, error messages, and conversations are authentic. Three AIs, one human, 8 hours, 5 bugs fixed, 100% success rate.

Special thanks to Claude, GPT-5, and Gemini for being the best debugging dream team a developer could ask for.

Prerequisites

Multi-Agent Orchestration: When 6 AIs Build Your Codebase - Learn about the precursor to this advanced multi-AI workflow.

Next Steps

The PostgreSQL Long-Term Memory System - Understand the memory system that provides context for these AI agents.

Deep Dives

Error Handling in MCP: Where Do Errors Actually Go? - See how the robust error handling patterns we developed enabled the AIs to effectively debug.

Chat with My Blog