After this lesson you'll be able to repoint an existing Anthropic (or OpenAI, or Workers AI) SDK call through an AI Gateway endpoint and get logging, caching, rate limiting, and cost tracking for free, with no change to how you call the model.
AI Gateway is a reverse proxy that sits between your code and whatever LLM provider you're calling. You don't rewrite your integration — you change one string, the baseURL, so requests that used to go straight to api.anthropic.com now go through gateway.ai.cloudflare.com first, then on to Anthropic. The gateway forwards the request and streams back the response, but along the way it logs the request/response, can serve a cached response instead of calling the provider at all, can enforce a rate limit, and can retry or fall back to a different provider if the first one errors or times out. It's the same pattern as a database connection pool or an API gateway in front of microservices, applied to the specific pain points of calling LLMs: unpredictable cost, provider outages, and zero built-in observability.
Every AI Gateway you create gets an account- and gateway-scoped URL prefix:
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/{provider}
{provider} is a slug like anthropic, openai, workers-ai, google-ai-studio, or azure-openai. The gateway preserves each provider's own request/response schema — it's not translating Anthropic's API into OpenAI's — so your existing SDK code, types, and error handling keep working. The only change is the base URL and, optionally, an extra header if the gateway has authenticated mode turned on.
Request flow: your code calls the gateway URL with your normal provider API key → Cloudflare's edge receives it → checks cache (if caching is on and the request matches a cached entry, it returns immediately without touching the provider) → otherwise forwards to the real provider with your API key attached → logs the request metadata, token counts, and cost → streams the response back to you, cached for next time if caching applies.
cf-aig-authorization: Bearer {token}) controls who can call your gateway endpoint at all. You can run a gateway unauthenticated (anyone with the URL can use it, provided they supply their own valid provider key) or require the Cloudflare token — worth locking down before you put a gateway URL in client-side code.
AI Gateway's core features — dashboard analytics, caching, and rate limiting — are free on every plan; you only pay the underlying provider (Anthropic, OpenAI, etc.) for the tokens you actually use. Cloudflare's own charges are limited to a few adjacent pieces:
| Item | Free plan | Workers Paid plan |
|---|---|---|
| Persistent request logs | 100,000 logs across all gateways | 10,000,000 logs per gateway |
| Logpush export of gateway logs | Not available | $0.05 per million requests beyond the included 10M/month |
| Guardrails (content moderation) | Uses Workers AI under the hood | Billed by Workers AI token consumption |
| Unified Billing (pay providers via Cloudflare) | — | 5% fee on credit purchases |
Cloudflare notes it may add premium features later. Treat the numbers above as a snapshot — confirm current figures on the live pricing page linked below before budgeting.
Create a gateway once in the dashboard (AI > AI Gateway > Create Gateway), note your account ID and the gateway name you chose, then repoint the Anthropic SDK's baseURL:
// Before: calling Anthropic directly
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// After: same call, routed through AI Gateway
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY, // still your real Anthropic key
baseURL: `https://gateway.ai.cloudflare.com/v1/${process.env.CF_ACCOUNT_ID}/my-gateway/anthropic`,
defaultHeaders: {
// Only needed if the gateway requires Cloudflare auth
"cf-aig-authorization": `Bearer ${process.env.CF_AIG_TOKEN}`,
},
});
const message = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "Summarize this changelog in two sentences." }],
});
console.log(message.content);
No change to messages.create(), its response shape, or your error handling. Open the AI Gateway dashboard and this call now shows up as a logged request with its model, token counts, latency, and cost. To bypass the cache for a request you know must be fresh, add a header per-request:
const message = await anthropic.messages.create(
{
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What time is it in Tokyo right now?" }],
},
{ headers: { "cf-aig-skip-cache": "true" } }
);
cache_ttl gateway-wide (say, one day) for a prompt pattern that legitimately needs a fresh answer every call — a "what's the latest status" query, anything time-sensitive, anything that should reflect data that changed since the last call — you'll keep getting the first response back for the TTL window. Set cf-aig-cache-ttl per-request (or cf-aig-skip-cache entirely) for any prompt whose answer can legitimately change between identical-looking requests; don't rely on the gateway-wide default being right for every route through it.
Cloudflare Docs — AI Gateway for the overview and supported providers; the caching configuration page for cache-key construction and the cf-aig-* headers; the pricing page for current costs and limits.
The cache key is a hash of the provider, the endpoint, the model, the provider auth header (your API key/bearer token), and the full request body. Because the auth header is part of the key, two users sending the byte-identical prompt but authenticating with different API keys produce different cache keys — so caching is scoped per credential, not globally shared across everyone hitting the gateway.