The Chatbot Architecture: A Blog That Answers Questions About Itself

My blog can answer questions about itself. You can ask it “how does the chatbot work?” and it retrieves this very post to answer you. That’s either very cool or very meta. Probably both.

The Full Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                          USER'S BROWSER                              │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │  operationalsemantics.dev (Cloudflare Pages)                  │  │
│  │  ┌──────────────────┐                                         │  │
│  │  │  BlogChat.astro  │ ◄── Chat widget component               │  │
│  │  │  (Frontend)      │                                         │  │
│  │  └────────┬─────────┘                                         │  │
│  └───────────┼───────────────────────────────────────────────────┘  │
└──────────────┼──────────────────────────────────────────────────────┘
               │ POST { query: "..." }
               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       CLOUDFLARE WORKERS                             │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │  blog-chat-worker (Proxy)                                     │  │
│  │  - CORS handling                                              │  │
│  │  - Auth token injection                                       │  │
│  └────────┬──────────────────────────────────────────────────────┘  │
└───────────┼─────────────────────────────────────────────────────────┘
            │ POST + Bearer token
            ▼
┌─────────────────────────────────────────────────────────────────────┐
│                  CLOUDFLARE AI SEARCH (AutoRAG)                      │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐    │
│  │  Crawler   │─▶│  Vectorize │─▶│  Reranker  │─▶│  Workers AI│    │
│  │  (Sitemap) │  │  (BGE-M3)  │  │  (BGE)     │  │  (Qwen3)   │    │
│  └────────────┘  └────────────┘  └────────────┘  └────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

Three components:

Chat Widget - The frontend UI on the blog
Proxy Worker - Handles CORS and hides the API token
AutoRAG - The AI brain that retrieves and generates answers

The chat interface is a single Astro component: BlogChat.astro

What it does:

Renders a floating “Ask AI” button in the bottom-right corner
Expands into a chat window when clicked
Sends user questions to the proxy worker
Displays AI responses

The core logic (simplified):

const WORKER_URL = 'https://blog-chat-worker.myronkoch-dev.workers.dev';

async function askQuestion(query) {
  const response = await fetch(WORKER_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query })
  });

  const data = await response.json();

  if (data.success && data.result?.response) {
    return data.result.response;
  } else {
    return 'Sorry, I encountered an error.';
  }
}

No API keys in the frontend. No complex state management. Just a fetch call to the proxy.

Component 2: The Proxy Worker

Why do we need this?

Two reasons:

CORS - Browsers block cross-origin requests. The worker adds the right headers.
Security - The AI Search API needs an auth token. We can’t put that in frontend code.

The worker code (25 lines that do everything):

export default {
  async fetch(request, env) {
    // Handle CORS preflight
    if (request.method === 'OPTIONS') {
      return new Response(null, {
        headers: {
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'POST, OPTIONS',
          'Access-Control-Allow-Headers': 'Content-Type',
        },
      });
    }

    // Get the query from request body
    const { query } = await request.json();

    // Call AI Search API with auth
    const response = await fetch(
      `https://api.cloudflare.com/client/v4/accounts/${env.ACCOUNT_ID}/autorag/rags/${env.RAG_ID}/ai-search`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${env.AI_SEARCH_TOKEN}`,
        },
        body: JSON.stringify({ query }),
      }
    );

    const result = await response.json();

    // Return with CORS headers
    return new Response(JSON.stringify(result), {
      headers: {
        'Content-Type': 'application/json',
        'Access-Control-Allow-Origin': '*',
      },
    });
  },
};

Configuration (wrangler.toml):

name = "blog-chat-worker"
main = "src/index.js"
compatibility_date = "2024-11-12"

[vars]
ACCOUNT_ID = "your-cloudflare-account-id"
RAG_ID = "your-autorag-instance-name"

# Secret - set via: wrangler secret put AI_SEARCH_TOKEN

The AI_SEARCH_TOKEN is stored as a secret, never in code.

Component 3: AutoRAG (The Brain)

Cloudflare’s AI Search (AutoRAG) handles the hard parts:

Crawling the sitemap
Chunking content
Generating embeddings
Storing vectors
Retrieving relevant content
Generating answers

Setup in Cloudflare Dashboard:

Go to AI → AI Search
Create new instance
Configure:

Setting	Value
Source Type	Sitemap
Sitemap URL	https://operationalsemantics.dev/sitemap-index.xml
Embedding Model	BGE-M3
Reranker	bge-reranker-base (enabled)
Generation Model	Qwen3 30B
Query Rewrite	Enabled

What happens when someone asks a question:

Query Rewrite - The model reformulates the question for better retrieval
Embedding - Question becomes a vector using BGE-M3
Retrieval - Vectorize finds similar content chunks
Reranking - BGE reranker scores and sorts results
Generation - Qwen3 30B generates an answer using retrieved context

All of this happens in ~2-3 seconds.

The Query Flow in Detail

User: "How does the MCP Factory work?"
         │
         ▼
┌─────────────────────────────────────────┐
│ 1. QUERY REWRITE                        │
│    "How does the MCP Factory work?"     │
│    → "MCP Factory architecture process  │
│       blockchain server generation"     │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│ 2. EMBEDDING (BGE-M3)                   │
│    Query → 1024-dim vector              │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│ 3. RETRIEVAL (Vectorize)                │
│    Find top-k similar content chunks    │
│    Returns: Post 11, 12, 13 chunks      │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│ 4. RERANKING (BGE-reranker)             │
│    Score each chunk against query       │
│    Reorder by relevance                 │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│ 5. GENERATION (Qwen3 30B)               │
│    Context: [retrieved chunks]          │
│    Question: [user query]               │
│    → Answer with citations              │
└─────────────────────────────────────────┘

Model Selection Deep Dive

I tested three models for generation:

Model	Accuracy	Style	Speed
Llama 3.3 70B	Good	Verbose	Medium
Llama 4 Scout 17B	Poor	Very verbose	Fast
Qwen3 30B	Best	Concise	Medium

Qwen3 30B won because:

Answers are confident and direct
Doesn’t hallucinate when context is missing
Better at synthesizing information from multiple chunks

The model choice matters more than you’d expect. Same retrieval, vastly different answer quality.

The Recursive Beauty

Here’s where it gets interesting:

This blog post explains how the chatbot works. The chatbot indexes this blog post. When you ask “how does the chatbot work?” - the chatbot retrieves this post to answer you.

The documentation is self-referential:

The system explains itself using itself
Updates to the docs immediately update the chatbot’s knowledge
You can verify the explanation by asking the chatbot

Try it: Ask the chatbot “how does the blog chatbot work?” and see if it references this post.

API Reference

Request

curl -X POST https://blog-chat-worker.myronkoch-dev.workers.dev \
  -H "Content-Type: application/json" \
  -d '{"query": "What is MCP Factory?"}'

Success Response

{
  "success": true,
  "result": {
    "response": "The MCP Factory is a server that generates other MCP servers...",
    "data": [
      {
        "file_id": "...",
        "filename": "https://operationalsemantics.dev/posts/12-mcp-factory-fantom-success/",
        "score": 0.624,
        "content": ["...relevant chunks..."]
      }
    ]
  }
}

Error Response

{
  "success": false,
  "errors": [
    { "code": 7002, "message": "ai_search_not_found" }
  ]
}

Indexing New Content

When I publish a new post:

Git push triggers Cloudflare Pages deploy
New sitemap generates with new post URL
AutoRAG’s scheduled sync detects the new URL
Crawler fetches and chunks the new content
Embeddings generate and store in Vectorize
Chatbot can now answer questions about the new post

Time from publish to searchable: ~5-10 minutes (scheduled sync) or immediate (manual sync)

Troubleshooting

”Sorry, I encountered an error”

Test the worker directly:

curl -s -X POST https://blog-chat-worker.myronkoch-dev.workers.dev \
  -H "Content-Type: application/json" \
  -d '{"query": "test"}' | jq

Common issues:

ai_search_not_found → Worker’s RAG_ID doesn’t match instance name
unauthorized → AI_SEARCH_TOKEN is wrong or expired
CORS errors → Worker not deployed or URL is wrong

Chatbot gives irrelevant answers

Check if content is indexed: AI Search dashboard → Data tab
Verify sitemap is current: curl https://operationalsemantics.dev/sitemap-0.xml
Try manual sync: AI Search dashboard → Jobs tab → Sync

New posts aren’t searchable

Verify sitemap includes new post
Wait for scheduled sync OR trigger manual sync
Check AI Search metrics for indexing errors

Cost

All on Cloudflare’s free tier:

Component	Free Tier
Workers	100k requests/day
Vectorize	5M vector dimensions
Workers AI	Generous limits
AI Search	Beta (currently free)

For a blog with moderate traffic, you won’t hit limits.

The Complete Code

Chat Widget: [BlogChat.astro on GitHub] Proxy Worker: [blog-chat-worker on GitHub] AutoRAG Config: Cloudflare Dashboard

The whole system is ~100 lines of code plus configuration. Most of the complexity is handled by Cloudflare’s managed services.

Summary

Layer	Technology	Purpose
Frontend	Astro component	Chat UI
Proxy	Cloudflare Worker	CORS + Auth
Retrieval	Vectorize + BGE-M3	Find relevant content
Reranking	BGE-reranker	Sort by relevance
Generation	Qwen3 30B	Generate answers
Indexing	AutoRAG crawler	Keep knowledge current

Ongoing maintenance: Near zero. Monthly cost: $0.

Next up: AI-Powered Cross-Posting - How AI publishes to Substack while I drink coffee.

The Chatbot Architecture: A Blog That Answers Questions About Itself

The Chatbot Architecture: A Blog That Answers Questions About Itself

The Full Architecture

Component 1: The Chat Widget

Component 2: The Proxy Worker

Component 3: AutoRAG (The Brain)

The Query Flow in Detail

Model Selection Deep Dive

The Recursive Beauty

API Reference

Request

Success Response

Error Response

Indexing New Content

Troubleshooting

”Sorry, I encountered an error”

Chatbot gives irrelevant answers

New posts aren’t searchable

Cost

The Complete Code

Summary

Chat with My Blog