After this lesson you'll be able to create a Vectorize index with the right dimension/metric config, insert embeddings generated by Workers AI, and run a similarity query filtered by metadata.
Vectorize is Cloudflare's globally-distributed vector database: a place to store embeddings — the numeric arrays that represent the "meaning" of a piece of text, an image, or audio — and run fast similarity search over them. It doesn't generate embeddings itself; you produce those with an embedding model (Workers AI's @cf/baai/bge-* family, or an external model like an OpenAI embedding endpoint), then hand the resulting vectors to Vectorize to store and query. Vectorize is the retrieval half of a system; something else always has to be the embedding half.
Three concepts define a Vectorize index, and the first two are locked in at creation time and cannot be changed later:
cosine (angle between vectors — the default choice for most text embedding models, since it ignores magnitude and compares direction), euclidean (straight-line distance), or dot-product (magnitude-sensitive, common when the embedding model was trained specifically for it). Use whatever metric your embedding model's documentation recommends — mismatching it won't error, but it will quietly degrade result quality.{ url, category, publishedAt }). Metadata rides along with the vector and can be returned with query results, and — if you create a metadata index on a given property first — used to filter results at query time.The request lifecycle is: embed → insert → query. You generate a vector externally, insert() or upsert() it into an index by ID with optional metadata, and later query() the index with a fresh vector to get back the topK nearest matches, each with a similarity score, and optionally their metadata and original values.
Create an index sized for the Workers AI bge-base-en-v1.5 embedding model, which outputs 768-dimensional vectors:
npx wrangler vectorize create product-search --dimensions=768 --metric=cosine
# Enable filtering on a metadata field before you rely on it in queries
npx wrangler vectorize create-metadata-index product-search --property-name=category --type=string
Bind the index (and Workers AI) in wrangler.toml:
[ai]
binding = "AI"
[[vectorize]]
binding = "VECTORIZE"
index_name = "product-search"
Embed and insert a product description:
export interface Env {
AI: Ai;
VECTORIZE: Vectorize;
}
export default {
async fetch(req: Request, env: Env) {
const { id, text, category } = await req.json<{
id: string; text: string; category: string;
}>();
const { data } = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: [text],
});
await env.VECTORIZE.insert([
{
id,
values: data[0], // 768 numbers — must match the index's --dimensions
metadata: { category, text },
},
]);
return Response.json({ inserted: id });
},
};
Query with a metadata filter — find products similar to a search phrase, restricted to one category:
const { data } = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: ["waterproof hiking boots"],
});
const results = await env.VECTORIZE.query(data[0], {
topK: 5,
filter: { category: "footwear" },
returnMetadata: "all",
});
// results.matches: [{ id, score, metadata: { category, text } }, ...]
return Response.json(results.matches);
Vectorize bills on two dimensions, not on storage size, CPU, or index count:
| Metric | Workers Free | Workers Paid |
|---|---|---|
| Stored dimensions | 5 million total | 10 million included, then $0.05 per 100 million |
| Queried dimensions | 30 million/month | 50 million/month included, then $0.01 per million |
"Stored dimensions" is vector count × dimensions summed across your indexes — a 768-dimension index with 10,000 vectors uses 7.68 million stored dimensions. "Queried dimensions" is charged per query as roughly (vectors scanned + 1) × dimensions, so higher-dimension embeddings and larger indexes both cost more per query. There's no charge for empty indexes, data transfer, or idle time. Confirm current numbers on the live pricing page below before quoting them — this is exactly the kind of figure that changes between plan revisions.
dimensions and metric at create time and cannot change them afterward. If you create an index with --dimensions=1536 (an OpenAI text-embedding-3-small shape) but later switch to Workers AI's bge-base-en-v1.5 (768 dimensions), every insert() or query() call will fail outright — the vector length doesn't match what the index expects. There's no silent truncation or padding. The fix is to pick your embedding model first, confirm its output dimension in its model card, size the index to match, and if you ever change embedding models, create a new index (and re-embed everything) rather than trying to reuse the old one.
The Vectorize documentation is the canonical reference for concepts and API shape; pair it with the Vectorize pricing page and limits page for current numbers, since both are subject to change.
--dimensions=1536 for an OpenAI embedding model, then decided to switch to a Workers AI model that outputs 768-dimensional vectors. What happens when you insert a 768-dimension vector into that index?Vectorize bills on stored dimensions (vector count × dimensions, summed across your indexes) and queried dimensions (roughly the number of vectors scanned per query × dimensions). There's no charge for CPU, index count, or idle storage — only actual stored and queried vector volume.