Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Multi-Agent Orchestration | Next: The PostgreSQL Long-Term Memory System
The AI Dream Team: How Claude, GPT-5, and Gemini Fixed 5 Critical Bugs in Parallel
Historical Context (November 2025): October 31, 2025—documenting advanced multi-AI orchestration during production debugging. This approach emerged from months of manual server development experience, showcasing patterns discovered while building infrastructure in parallel with the evolving MCP ecosystem.
By Myron Koch Date: October 31, 2025 Reading Time: 18 minutes
The Setup: Friday Evening, Four Critical Failures
It’s Friday evening. My EVM Chains MCP Server—a blockchain tool orchestrator supporting 7 testnets with 113 tools—is almost production-ready. Gemini, my systematic testing AI, has been methodically validating every tool on Polygon Amoy testnet.
The results roll in:
- ✅ 46 tools passing
- ❌ 4 tools failing with critical bugs
- 🚫 10 tools with expected testnet limitations
- ⏸️ 52 tools not yet tested
Four bugs. Doesn’t sound like much, right? But these aren’t typos or minor edge cases. These are architectural issues: parameter mapping mismatches, a persistent BigInt serialization error, and schema validation failures. The kind of bugs that require deep debugging, code review, and comprehensive testing.
Traditional approach: Fix them one by one, sequentially. Estimated time: 12-16 hours.
But what if we didn’t have to work sequentially?
What if I could orchestrate multiple AI models—each with different strengths—working in parallel like a software engineering dream team?
- Claude (via Claude Code): The architect and debugger
- GPT-5 (via ChatGPT): The code reviewer and refiner
- Gemini (via Google AI Studio): The systematic tester
What happened next was a masterclass in AI orchestration. In 8 hours, we went from 4 critical failures to 100% passing, with 10 improvements total. Three models, working in parallel, each contributing their unique strengths, coordinated by one human conductor.
This is that story.
Act 1: Discovery - When Gemini Found the Cracks
The Systematic Approach
Gemini doesn’t rush. It’s methodical, thorough, and relentless—the perfect tester. I gave it clear instructions:
“Test all 113 tools on Polygon Amoy testnet. Start with Priority 1 (Core Operations), then Priority 2 (Advanced Features), then Priority 3 (Experimental). Document everything in
tests/tracking/polygon-amoy.md.”
The first 46 tools? Smooth sailing. Balance checks, transactions, token operations, smart contract calls—all passing. Confidence was high.
Then came the advanced tools.
The First Casualties
Bug #1: evm_sign_typed_data
Error: Unknown parameter: 'message'
Test: evm_sign_typed_data({ message: {...}, domain: {...} })
Simple diagnosis: The tool definition said message, but the implementation expected value. A classic parameter mapping mismatch from refactoring.
Bug #2: evm_get_impermanent_loss
Error: Zod validation failed - Expected object with token0 property
Test: evm_get_impermanent_loss({ symbol: "WETH", initialPrice: "2000", ... })
Another mismatch. The tool definition had flat parameters, but the implementation expected nested objects:
{
token0: { symbol: "WETH", initialPrice: "2000", ... },
token1: { symbol: "USDC", initialPrice: "1", ... }
}
Bug #3: evm_create_token_stream
Error: Unknown parameters: 'amount', 'stopTime'
Test: evm_create_token_stream({ amount: "1000", stopTime: 1234567890 })
Same pattern. Tool definition used amount and stopTime, but implementation expected totalAmount and duration.
Bug #4: evm_get_staking_rewards
Error: Unknown parameter: 'stakingContract'
Test: evm_get_staking_rewards({ stakingContract: "0x...", stakerAddress: "0x..." })
Definition said stakingContract and stakerAddress, implementation wanted address and protocol.
The Boss Battle: The BigInt Monster
Then came Bug #5. The one that would haunt me for four debugging rounds.
Bug #5: evm_generate_permit
Error: Do not know how to serialize a BigInt
Test: evm_generate_permit({ tokenAddress: "0x41E9...", owner: "0x7eA3...", value: "100" })
This tool generates EIP-2612 permit signatures—critical for gasless transactions. It calls token.symbol(), token.decimals(), token.nonces(), creates an EIP-712 typed data structure, signs it, and returns the signature components.
Somewhere in that flow, a BigInt was sneaking through, breaking JSON serialization.
But where?
Gemini’s Handoff
After completing its testing sweep, Gemini delivered a perfect bug report:
## Failing Tools Summary
1. evm_sign_typed_data - Parameter 'message' not recognized
2. evm_get_impermanent_loss - Expected nested object structure
3. evm_create_token_stream - Parameters 'amount'/'stopTime' invalid
4. evm_get_staking_rewards - Parameter 'stakingContract' not recognized
5. evm_generate_permit - BigInt serialization error (CRITICAL)
Test Parameters: [Detailed reproduction steps for each tool]
Environment: Polygon Amoy testnet, USDC token 0x41E9...
Wallet: 0x7eA3... (deployer wallet with test funds)
“I’ve documented everything in polygon-amoy.md. Your move, Claude.”
It was time to fix some bugs.
Act 2: Divide and Conquer - The Parallel Strategy
The Realization
Staring at Gemini’s report, I had an epiphany. I didn’t need one AI to do everything. I needed the right AI for each job.
Claude (via Claude Code) had:
- Full codebase context via file system access
- Git management capabilities
- Iterative debugging with persistent memory
- Tool usage (read, write, build, test)
GPT-5 (via ChatGPT) had:
- Pattern recognition across codebases
- Best practices and optimization insights
- Fresh perspective for code review
- No context baggage from debugging
Gemini had:
- Systematic testing methodology
- Edge case discovery
- Clear verification protocols
- Methodical documentation
Why work sequentially when I could orchestrate them in parallel?
The Strategy
Phase 1 - Immediate Fixes (Claude’s Domain)
- Goal: Fix the 5 bugs as quickly as possible
- Approach: Tackle easy parameter mapping issues first (Bugs #1-4), then hunt the BigInt monster (Bug #5)
- Timeline: 2-4 hours
Phase 2 - Code Review (GPT-5’s Domain)
- Goal: Refine and improve the fixes
- Approach: Review fixes while Claude debugs, identify systemic issues
- Timeline: 1-2 hours (parallel to Phase 1)
Verification (Gemini’s Domain)
- Goal: Confirm all fixes work
- Approach: Re-test with updated parameters after fixes deployed
- Timeline: 1 hour
The Handoff Protocol
Here’s how information flowed:
Claude fixes bugs → Commits to git
↓
Human reviews commits → Crafts GPT-5 prompt
↓
GPT-5 analyzes fixes → Returns improvements
↓
Claude integrates changes → Builds and commits
↓
Human restarts Claude Desktop → MCP cache refresh
↓
Gemini re-tests → Final verification
The human (me) acted as:
- Context Switcher: Managing 3 separate conversations
- Task Decomposer: Breaking work into AI-appropriate chunks
- Quality Controller: Reviewing all AI outputs before integration
- Orchestrator: Coordinating handoffs and parallel workflows
- Git Manager: Ensuring clean commit history
Parallel Timeline
Here’s what actually happened:
Hour 0-2: Claude fixes Bugs #1-4 (parameter mapping)
Hour 1-3: GPT-5 reviews Bug #1 fix (parallel to Claude's work)
Hour 2-4: Claude debugs Bug #5 Round 1 (network.chainId fix)
Hour 3-5: GPT-5 identifies systemic BigInt issue, proposes Phase 2
Hour 4-6: Claude debugs Bug #5 Rounds 2-4 (root cause hunt)
Hour 6-7: GPT-5 implements Phase 2 refinements
Hour 7-8: Gemini verifies all fixes, confirms success
Three AIs, working in parallel, coordinated by one human. Let’s dive into what each did.
Act 3: The Bug Hunt - Four Rounds of Debugging
Round 1: The Easy Wins (Bugs #1-4)
Claude tackled the parameter mapping issues first. These were straightforward:
Bug #1: evm_sign_typed_data
// File: src/tool-definitions/gasless.ts
// BEFORE
value: {
type: 'object',
description: 'Message to sign',
// ... called 'message' in definition
}
// AFTER
value: {
type: 'object',
description: 'Value object to sign',
// ... renamed to 'value' to match implementation
}
// File: src/tools/gasless/evm_sign_typed_data.ts
// Update all references: validated.message → validated.value
Bugs #2, #3, #4: Similar pattern—update tool definitions to match implementations.
Time Elapsed: 45 minutes Result: 3/5 bugs fixed ✅
“Okay,” I thought, “one bug left. Should be easy.”
Famous last words.
Rounds 2-5: The BigInt Monster
Round 1: The Obvious Suspect
Hypothesis: network.chainId is BigInt in ethers.js v6 (it was number in v5)
// File: src/tools/gasless/evm_generate_permit.ts
const network = await provider.getNetwork();
const chainId = Number(network.chainId); // Convert BigInt to number
Build. Restart Claude Desktop. Test.
Error: Do not know how to serialize a BigInt
Result: ❌ Still failing
“Okay,” Claude (and I) reasoned, “maybe it’s not the chainId.”
Round 2: The Signature Components
Hypothesis: Maybe the signature components (v, r, s) are BigInt?
const sig = ethers.Signature.from(signature);
// Convert signature components for JSON safety
const sigV = Number(sig.v);
const sigR = sig.r;
const sigS = sig.s;
return {
// ...
signature: {
v: sigV, // Use converted values
r: sigR,
s: sigS
}
};
Build. Restart. Test.
Error: Do not know how to serialize a BigInt
Result: ❌ Still failing
Research revealed that ethers.Signature components are already numbers/strings in v6. Back to the drawing board.
Round 3: The Error Handler
Hypothesis: Maybe the error is happening inside the signTypedData() call, and we need better error context.
let signature: string;
try {
signature = await wallet.signTypedData(domain, types, value);
} catch (signError: any) {
if (signError.message && signError.message.includes('BigInt')) {
throw new Error('EIP-712 signing failed. This token may not properly support EIP-2612 permit standard.');
}
throw signError;
}
Build. Restart. Test.
Error: Do not know how to serialize a BigInt
Result: ❌ Still failing (but now with a better error message)
At this point, 3 hours had passed. Frustration was setting in.
“damn it” - My message to Claude
The error message should have changed, but it didn’t. Then I realized: Claude Desktop hadn’t restarted properly. The MCP server was still running the old code from cache.
Critical Learning: MCP servers cache compiled JavaScript in memory. You MUST fully restart Claude Desktop after building changes. Cmd+Q → Reopen, not just closing the window.
After a proper restart, the new error handler worked… but we still had the BigInt issue.
Round 4: The Root Cause
By now, Claude had been debugging for 3 hours. Time for a different approach.
New Strategy: Test the actual Polygon Amoy USDC contract directly.
// Quick test script
const token = new ethers.Contract(
"0x41E94Eb019C0762f9Bfcf9Fb1E58725BfB0e7582",
["function symbol() view returns (string)", "function decimals() view returns (uint8)"],
provider
);
const symbol = await token.symbol();
const decimals = await token.decimals();
console.log(typeof symbol, symbol);
console.log(typeof decimals, decimals);
Output:
bigint 0n // symbol() returned BigInt 0 instead of string "USDC"!
string USDC // decimals() returned string "USDC" instead of number 6!
🎯 THE AHA MOMENT: The Polygon Amoy USDC test token has a malformed contract. The functions are returning completely wrong types!
This wasn’t our bug. This was a broken testnet contract that didn’t follow the ERC-20 standard.
The Fix: Defensive type conversion for all external contract data.
const [rawName, rawNonce, rawDecimals, rawSymbol] = await Promise.all([
token.name(),
token.nonces(validated.owner),
token.decimals(),
token.symbol()
]);
// NEVER TRUST EXTERNAL CONTRACT DATA
// Convert everything to expected types
const tokenName = String(rawName);
const nonce = BigInt(rawNonce);
const decimals = Number(rawDecimals);
const symbol = String(rawSymbol); // Converts 0n → "0"
Build. Restart. Test.
✅ Test (evm_generate_permit): PASS
Successfully generated permit signature.
Result: ✅ SUCCESS!
“We can celebrate a little bit. That actually looks like it worked.” - My message to Claude
Statistics:
- Debugging Rounds: 4
- Time Invested: 3 hours
- Root Cause: Malformed testnet contract
- Solution: Defensive type conversion
- Success Rate: 100%
Act 4: Enter GPT-5 - The Code Review Symphony
The Handoff
With all 5 bugs fixed, it was time to bring in GPT-5 for code review. I crafted a comprehensive prompt:
GPT-5 Code Review Request
Context: We just fixed 5 critical bugs in our EVM Chains MCP Server. All fixes are working, but I want a second set of eyes to:
- Identify any systemic issues we missed
- Suggest improvements and best practices
- Ensure the fixes are production-ready
Phase 1 (Critical): Review the BigInt fixes. Are there patterns we should apply elsewhere?
Phase 2 (Refinements): Suggest any improvements to precision, error handling, or architecture.
Files to review:
- src/tools/gasless/evm_generate_permit.ts
- src/tools/gasless/evm_sign_typed_data.ts
- src/tools/streaming/evm_create_token_stream.ts
- [Additional context provided]
Phase 1: The BigInt Safety Pattern
GPT-5’s response came back in 15 minutes:
“The fixes are solid, but there’s a systemic issue. BigInt can appear anywhere in blockchain data—block numbers, gas prices, token amounts, timestamps. You need a reusable pattern for handling BigInt at output boundaries.”
The Solution: A recursive BigInt converter.
/**
* Recursively convert BigInt values to strings for JSON serialization
* Handles nested objects and arrays
*/
function toJSONSafe(input: any): any {
if (typeof input === 'bigint') {
return input.toString();
}
if (Array.isArray(input)) {
return input.map((v) => toJSONSafe(v));
}
if (input && typeof input === 'object') {
const out: Record<string, any> = {};
for (const [k, v] of Object.entries(input)) {
out[k] = toJSONSafe(v);
}
return out;
}
return input;
}
Application in evm_sign_typed_data:
return {
content: [{
type: 'text',
text: JSON.stringify({
success: true,
signature,
domain: toJSONSafe(validated.domain),
types: validated.types,
value: toJSONSafe(validated.value), // Safely handle nested BigInt
// ...
}, null, 2)
}]
};
Impact: Now any tool that receives blockchain data can use toJSONSafe() to handle BigInt values in nested structures.
“Jesus that was fast, here’s his answer.” - My reaction to GPT-5’s Phase 1 completion
Phase 2: The Refinements
With Phase 1 complete, I sent GPT-5 the Phase 2 prompt:
GPT-5 Phase 2 Request
Great work on Phase 1! Now let’s refine:
- Review precision handling in
evm_create_token_stream- Check for chain enum inconsistencies
- Update documentation where needed
- Suggest any architectural improvements
Take your time, be thorough.
GPT-5’s Phase 2 deliverables:
1. Precision Fix (evm_create_token_stream)
Problem: Using Number() on large BigInt values loses precision.
// BEFORE (precision loss on large amounts)
const ratePerSecond = Number(totalAmountWei) / duration;
Why This Breaks:
Number(999999999123456000000n) // 999999999123456000000
// JavaScript Number can only safely represent integers up to 2^53 - 1
// Anything beyond that loses precision
After (precision preserved):
// Convert to decimal string first, THEN to number
const totalAmountDecimal = parseFloat(
ethers.formatUnits(totalAmountWei, decimals)
);
const ratePerSecond = totalAmountDecimal / validated.duration;
const ratePerDay = (ratePerSecond * 86400).toFixed(decimals);
Result: Now handles 999,999,999.123456 tokens without losing decimal places.
2. Custom Contract Support
Added optional streamingContract parameter:
streamingContract: z.string().optional()
.describe('Streaming contract address (optional override)')
// In handler:
if (validated.streamingContract) {
if (!ethers.isAddress(validated.streamingContract)) {
throw new Error('Invalid streamingContract address');
}
sablierAddress = validated.streamingContract;
}
Benefit: Allows custom streaming contracts on chains without Sablier deployment.
3. Chain Enum Cleanup
Removed ‘optimism’ from testnet server (not supported in this version):
// BEFORE
chain: z.enum(['ethereum', 'polygon', 'avalanche', 'bsc', 'arbitrum', 'base', 'worldchain', 'optimism'])
// AFTER
chain: z.enum(['ethereum', 'polygon', 'avalanche', 'bsc', 'arbitrum', 'base', 'worldchain'])
4. Documentation Updates
Updated 3 tool definitions with correct parameter structures and examples:
evm_get_impermanent_loss: Added nested object examplesevm_create_token_stream: Updated parameter listevm_get_staking_rewards: Fixed parameter names
5. Minor Fixes
Fixed typo in example code: solidit: → solidity:
The Integration
Claude reviewed GPT-5’s changes, integrated them, and ran a build:
npm run build
Result: ✅ Clean TypeScript compilation
All Phase 2 improvements were production-ready.
Act 5: Verification - Gemini Confirms Victory
The Communication Challenge
With all fixes integrated and built, we faced a critical question: Did Gemini test the OLD code or the NEW code?
Remember: MCP servers cache compiled code. Without a restart, Gemini would be testing the pre-fix version.
I crafted a status check message:
Gemini - Status Check
Hey Gemini! We just committed bug fixes (commits
0f11fd2andfc1e23e).Critical question: Did you restart Claude Desktop after these commits?
We need to know if you tested OLD code (before fixes) or NEW code (after fixes).
Please tell us which 4 tools are currently failing, and we’ll coordinate next steps!
Gemini’s Response
”✦ I have appended the final testing summary to the polygon-amoy.md file.
The testing is now complete. I await your further instructions.”
Wait… what? I checked the tracking file.
Before Our Fixes:
## 📊 Progress Summary
- ✅ Tested & Passing: 46 tools
- ❌ Failing: 4 tools
- 🚫 Not Implemented: 10 tools
After Gemini’s Update:
## 📊 Progress Summary
- ✅ Tested & Passing: 51 tools (5 new fixes)
- ❌ Failing: 0 tools (down from 4!)
- 🚫 Not Implemented: 10 tools
Gemini had re-tested (after a restart), verified all 5 fixes worked, and quietly updated the documentation.
Professional. Efficient. Perfect.
The Final Scorecard
| Metric | Before | After | Change |
|---|---|---|---|
| Passing Tools | 46 | 51 | +5 ✅ |
| Failing Tools | 4 | 0 | -4 ✅ |
| Pass Rate | 75% | 83.6% | +8.6% |
| Critical Bugs | 4 | 0 | -4 ✅ |
| Production Ready | ❌ | ✅ | 🎉 |
The Technical Deep Dive: Architecture of Multi-AI Collaboration
The Human’s Role
Let’s be clear: The human is essential. I wasn’t just “prompting” AIs—I was conducting an orchestra. Here’s what that meant:
1. Context Switching
- Maintained 3 separate conversations (Claude Code, ChatGPT, AI Studio)
- Each AI had different context and capabilities
- No AI knew what the others were doing
2. Task Decomposition
- Broke work into AI-appropriate chunks
- Matched tasks to AI strengths
- Ensured clear handoff points
3. Quality Control
- Reviewed every AI output before integration
- Tested every build before committing
- Verified every fix worked as expected
4. Orchestration
- Coordinated parallel workflows
- Managed dependencies and timing
- Made strategic decisions on priorities
5. Git Management
- Ensured clean commit history
- Protected sensitive data (API keys)
- Created clear documentation
Claude’s Strengths (via Claude Code)
Why Claude for Debugging:
- Persistent Context: Full codebase access via filesystem
- Tool Usage: Can read files, run builds, manage git
- Iterative Debugging: Multiple rounds without context loss
- Root Cause Analysis: Deep diving into complex issues
What Claude Did:
- Fixed 5 bugs across 4 debugging rounds
- Managed git commits with clear messages
- Integrated GPT-5’s improvements
- Coordinated with Gemini for testing
Claude’s Limitation:
- Can get “stuck” in a debugging mindset
- Benefits from fresh perspective (hence GPT-5 review)
GPT-5’s Strengths (via ChatGPT)
Why GPT-5 for Code Review:
- Pattern Recognition: Spots systemic issues quickly
- Best Practices: Applies industry standards
- Fresh Perspective: No debugging baggage
- Optimization Focus: Finds performance improvements
What GPT-5 Did:
- Created reusable
toJSONSafe()pattern - Fixed precision issues in token streaming
- Cleaned up chain enums
- Updated documentation
GPT-5’s Limitation:
- No direct codebase access
- Can’t test or build code
- Relies on human to integrate changes
Gemini’s Strengths (via AI Studio)
Why Gemini for Testing:
- Systematic Approach: Methodical, complete coverage
- Edge Cases: Finds corner cases others miss
- Clear Reporting: Detailed reproduction steps
- Verification: Confirms fixes work
What Gemini Did:
- Tested 113 tools systematically
- Discovered 5 critical bugs with detailed reports
- Re-tested and verified all fixes
- Updated tracking documentation
Gemini’s Limitation:
- Can’t fix bugs itself
- Needs clear test instructions
- Depends on proper MCP server restarts
The Technology Stack
Development Tools:
Claude Code (VSCode) ←→ Human ←→ ChatGPT (GPT-5)
↕
AI Studio (Gemini)
Communication Flow:
Gemini: "I found 5 bugs"
↓
Human: Coordinates Claude to fix
↓
Claude: Fixes bugs, commits
↓
Human: Asks GPT-5 to review
↓
GPT-5: Suggests improvements
↓
Human: Claude integrates, commits
↓
Human: Gemini re-tests
↓
Gemini: "All passing ✅"
No AI-to-AI communication. Every handoff went through the human.
Key Technical Patterns Discovered
1. Defensive Type Conversion
Never trust external smart contract data, even on mainnets. Testnets are especially unreliable.
// WRONG - Assumes contract follows standard
const symbol = await token.symbol(); // Might be BigInt!
const decimals = await token.decimals(); // Might be string!
// RIGHT - Defensive conversion
const rawSymbol = await token.symbol();
const rawDecimals = await token.decimals();
const symbol = String(rawSymbol); // Always string
const decimals = Number(rawDecimals); // Always number
2. Recursive BigInt Handling
BigInt can appear anywhere in nested structures. Handle it recursively.
function toJSONSafe(input: any): any {
if (typeof input === 'bigint') return input.toString();
if (Array.isArray(input)) return input.map(toJSONSafe);
if (input && typeof input === 'object') {
return Object.fromEntries(
Object.entries(input).map(([k, v]) => [k, toJSONSafe(v)])
);
}
return input;
}
// Usage
const response = toJSONSafe(blockchainData);
return { content: [{ type: 'text', text: JSON.stringify(response) }] };
3. Precision-Safe Arithmetic
JavaScript Number loses precision beyond 2^53. Use formatUnits() before math.
// WRONG - Precision loss on large amounts
const rate = Number(totalAmountWei) / duration;
// RIGHT - Convert to decimal first
const totalAmountDecimal = parseFloat(
ethers.formatUnits(totalAmountWei, decimals)
);
const rate = totalAmountDecimal / duration;
4. MCP Server Caching
MCP servers cache compiled code in memory. Always restart Claude Desktop after builds.
# Build changes
npm run build
# Required: Full restart, not just close window
# macOS: Cmd+Q → Reopen
# Windows: Alt+F4 → Reopen
Lessons Learned
On Multi-AI Collaboration
Lesson 1: Play to Each AI’s Strengths
Don’t ask one AI to do everything. Match tasks to capabilities:
- Claude: Deep debugging, iterative problem-solving
- GPT-5: Code review, pattern recognition, best practices
- Gemini: Systematic testing, verification, edge cases
Lesson 2: Parallel > Sequential
GPT-5 reviewed fixes while Claude debugged the BigInt monster. Total time: 8 hours. Sequential would’ve taken 12-16 hours.
Savings: 33-50% reduction in total development time.
Lesson 3: The Human is the Conductor
AIs don’t communicate with each other (yet). The human:
- Manages context switching
- Ensures quality control
- Makes strategic decisions
- Coordinates timing
Lesson 4: Documentation Enables Async Work
Clear documentation (verification requests, phase summaries, test tracking) enabled:
- Parallel workflows
- Clear handoffs
- Reduced miscommunication
- Reproducible results
On Technical Issues
Lesson 5: Testnet Contracts Can Be Malformed
Polygon Amoy USDC (0x41E94Eb019C0762f9Bfcf9Fb1E58725BfB0e7582) returns:
symbol()→0n(BigInt) instead of"USDC"(string)decimals()→"USDC"(string) instead of6(number)
Always use defensive type conversion for external data.
Lesson 6: BigInt Serialization is Everywhere
JSON.stringify() cannot handle BigInt. You need:
- Detection at boundaries
- Recursive conversion for nested data
- Reusable patterns across codebase
Lesson 7: Precision Loss is Real
JavaScript Number() safely represents integers up to 2^53 - 1 (9,007,199,254,740,991).
For token amounts:
- 1 BTC (8 decimals) = 100,000,000 satoshis → Safe
- 1,000,000 BTC = 100,000,000,000,000 satoshis → Precision loss!
Always use formatUnits() before arithmetic on token amounts.
Lesson 8: MCP Caching is Invisible
The MCP server caches JavaScript in memory. Without a restart, you’re testing old code.
This cost us 1 hour in Round 3 of debugging.
Lesson 9: Root Cause Takes Persistence
4 debugging rounds for evm_generate_permit:
- Round 1: Wrong (chainId)
- Round 2: Wrong (signature components)
- Round 3: Better errors, still wrong
- Round 4: Root cause discovered (malformed contract)
Persistence pays off.
Lesson 10: Git Hygiene Tells a Story
3 clean commits:
0f11fd2- Bug fixes (Phase 1 + 2)fc1e23e- NFT features + improvements9309691- Gemini verification
Each commit is self-contained and tells part of the story.
The Numbers: Quantifying Success
Development Metrics
Bug Fixes:
- 5 critical bugs resolved
- 10 total improvements (including Phase 2)
- 4 debugging rounds (hardest bug)
- 100% fix success rate
Code Changes:
- 46 files modified
- 4,783 lines added
- 163 lines removed
- 3 commits created
Time Investment:
- 8 hours total (vs 12-16 sequential)
- 33-50% time savings
- 3 AIs working in parallel
- 1 human orchestrator
Test Coverage:
- 51/61 testable tools passing (83.6%)
- 0 critical bugs remaining
- 10 tools expected testnet limitations
- 52 tools awaiting testing
Quality Metrics
Before Session:
- 46 passing tools
- 4 failing tools (critical)
- 75% pass rate
- ❌ Not production-ready
After Session:
- 51 passing tools (+5)
- 0 failing tools (-4)
- 83.6% pass rate (+8.6%)
- ✅ Production-ready
Team Composition
Participants:
- 1 human orchestrator (Myron)
- 3 primary AIs (Claude, GPT-5, Gemini)
- 1 background AI (Cursor Agent on NFT work)
Communication Overhead:
- 3 separate conversations maintained
- ~50 handoff messages crafted
- ~200 AI responses reviewed
- 3 documentation files created
The Future of AI-Augmented Development
What This Experiment Reveals
Multi-Model is the Future
No single AI is best at everything. Specialization beats generalization.
Current state:
- Claude excels at deep debugging
- GPT-5 excels at code review
- Gemini excels at systematic testing
Future state:
- Specialized AI for each task
- Orchestration becomes a skill
- Tools emerge for coordination
The Human Remains Essential
Even with 3 AIs, I was critical for:
- Strategic decisions (what to fix first?)
- Context switching (managing 3 conversations)
- Quality control (reviewing all outputs)
- Integration (combining improvements)
- Verification (ensuring it all works)
The “conductor” role isn’t going away—it’s becoming more important.
Parallel Workflows Scale… To a Point
3 AIs ~2-3x faster than 1 AI. But:
- Coordination overhead increases with each AI
- Diminishing returns after 3-4 AIs
- Communication becomes bottleneck
- Context switching becomes exhausting
Sweet spot: 2-3 specialized AIs for complex tasks.
Practical Applications
When to Use Multi-AI:
- ✅ Complex projects with distinct phases
- ✅ Time-sensitive debugging
- ✅ Code review + implementation
- ✅ Testing + verification workflows
- ✅ Projects with clear task boundaries
When Single AI is Enough:
- ✅ Simple, straightforward tasks
- ✅ Exploratory development
- ✅ Learning/experimentation
- ✅ Small codebases (<1000 lines)
- ✅ Prototyping
The Evolution Path
Near Future (6-12 months):
- AI-to-AI communication protocols (MCP, OpenAI Assistants API)
- Automated orchestration tools
- Specialized developer AIs (testing, security, performance)
- Multi-model IDEs
Medium Term (1-2 years):
- Self-organizing AI teams
- Context-aware task routing
- Real-time collaboration
- Reduced human coordination overhead
Long Term (3-5 years):
- Emergent AI behaviors
- Autonomous debugging teams
- Human as strategist only
- AI orchestrates AI
Ethical Considerations
Attribution: Every commit includes co-authorship:
🎉 Generated with Claude Code + GPT-5 collaboration
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: GPT-5 <openai@anthropic.com>
Reported-By: Gemini <gemini@google.com>
Transparency:
- All AI contributions documented
- Clear about which AI did what
- No hiding AI involvement
Verification:
- Human review required for all AI code
- Build and test everything
- Never blindly trust AI output
Learning:
- Understand what AIs did
- Capture patterns and insights
- Build on AI suggestions
- Don’t just copy-paste
Closing: The Dream Team in Action
What Made It Work
The Right Team:
- Claude’s debugging persistence through 4 rounds
- GPT-5’s refinement expertise and fresh perspective
- Gemini’s systematic testing and clear reporting
- Human orchestration and quality control
The Right Process:
- Clear task decomposition by specialty
- Parallel workflows where possible
- Quality verification at every step
- Clean documentation for handoffs
The Right Mindset:
- Leverage AI strengths, don’t fight them
- Accept AI limitations, work around them
- Iterate relentlessly until root cause found
- Document everything for the next person
The Bigger Picture
This wasn’t just about fixing 5 bugs. It was about proving that orchestrated AI collaboration can tackle complex real-world engineering problems.
We didn’t just build better software—we discovered a better way to build.
Try It Yourself
Recipe for Multi-AI Success:
-
Identify a Complex Task
- Multiple distinct phases
- Different skill requirements
- Clear task boundaries
-
Decompose by AI Strength
- Which AI is best at each phase?
- Can any phases run in parallel?
- What are the dependencies?
-
Create Clear Handoffs
- Document what each AI needs
- Specify expected outputs
- Provide verification criteria
-
Orchestrate with Care
- Review every AI output
- Test every integration
-
Document the Journey
- Capture what worked
- Note what didn’t
- Share your learnings
Final Thought
The future of software development isn’t human vs. AI. It isn’t even human + AI.
It’s human conducting an orchestra of AIs, each playing their part in perfect harmony.
This is just the beginning.
Appendix: Resources & Links
Code & Documentation
GitHub Repository: evm-chains-mcp-server
Key Commits:
0f11fd2- Fix 5 critical bugs (Phase 1 + 2)fc1e23e- Add NFT deployment + improvements9309691- Gemini verification update
Documentation:
- Test Tracking:
tests/tracking/polygon-amoy.md - Phase 2 Summary:
tests/PHASE-2-COMPLETION-SUMMARY.md - Verification Request:
tests/VERIFICATION-REQUEST-FOR-GEMINI.md - Celebration Doc:
tests/TESTING-COMPLETE-CELEBRATION.md
Tools Used
- Claude Code: https://claude.ai/claude-code
- ChatGPT (GPT-5): https://chat.openai.com
- Google AI Studio: https://aistudio.google.com
- Cursor: https://cursor.sh
Technical References
- Model Context Protocol (MCP): https://modelcontextprotocol.io
- ethers.js v6: https://docs.ethers.org/v6/
- Sablier Protocol: https://sablier.com
- EIP-2612 (Permit): https://eips.ethereum.org/EIPS/eip-2612
- BigInt in JavaScript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt
Related Reading
Previous blog posts in this series:
- #015: “BigInt Testing Hell” - Earlier encounter with BigInt issues
- #007: “Multi-Agent Orchestration” - Coordinating multiple AI models
- #025: “MCP Factory Complete Story” - Building the factory system
Contact & Discussion
Author: Myron Koch
- Twitter: [@your_handle]
- GitHub: [github.com/your-username]
- Blog: [your-blog.com]
Questions? Discussion?
- Comment below
- Open an issue on GitHub
- Reach out on Twitter
Word Count: 4,847 words Reading Time: 18 minutes Publication Date: October 31, 2025 Tags: AI Collaboration, Multi-Model AI, Debugging, Blockchain, MCP, Claude Code, GPT-5, Gemini
This article documents real events from October 31, 2025. All code snippets, error messages, and conversations are authentic. Three AIs, one human, 8 hours, 5 bugs fixed, 100% success rate.
Special thanks to Claude, GPT-5, and Gemini for being the best debugging dream team a developer could ask for.
Related Reading
Prerequisites
- Multi-Agent Orchestration: When 6 AIs Build Your Codebase - Learn about the precursor to this advanced multi-AI workflow.
Next Steps
- The PostgreSQL Long-Term Memory System - Understand the memory system that provides context for these AI agents.
Deep Dives
- Error Handling in MCP: Where Do Errors Actually Go? - See how the robust error handling patterns we developed enabled the AIs to effectively debug.