AI · Vectorize

Store and query vector embeddings with Vectorize

After this lesson you'll be able to create a Vectorize index with the right dimension/metric config, insert embeddings generated by Workers AI, and run a similarity query filtered by metadata.

Vectorize is Cloudflare's globally-distributed vector database: a place to store embeddings — the numeric arrays that represent the "meaning" of a piece of text, an image, or audio — and run fast similarity search over them. It doesn't generate embeddings itself; you produce those with an embedding model (Workers AI's @cf/baai/bge-* family, or an external model like an OpenAI embedding endpoint), then hand the resulting vectors to Vectorize to store and query. Vectorize is the retrieval half of a system; something else always has to be the embedding half.

Why not just store vectors in a normal database? Similarity search — "find the 5 vectors closest to this one" — isn't something a row-oriented or key-value store does efficiently at scale. Vectorize is purpose-built with an index structure for approximate nearest-neighbor search, so a query stays fast whether the index holds thousands or millions of vectors.

How it works

Three concepts define a Vectorize index, and the first two are locked in at creation time and cannot be changed later:

Dimensions — the length of every vector the index stores, e.g. 768. This must exactly match the output size of whatever embedding model you use. An index doesn't inspect or transform vectors to fit; it just expects every vector inserted or queried to have precisely this many numbers.
Metric — how "closeness" between two vectors is calculated: cosine (angle between vectors — the default choice for most text embedding models, since it ignores magnitude and compares direction), euclidean (straight-line distance), or dot-product (magnitude-sensitive, common when the embedding model was trained specifically for it). Use whatever metric your embedding model's documentation recommends — mismatching it won't error, but it will quietly degrade result quality.
Metadata — arbitrary JSON you attach to each vector alongside its ID (e.g. { url, category, publishedAt }). Metadata rides along with the vector and can be returned with query results, and — if you create a metadata index on a given property first — used to filter results at query time.

The request lifecycle is: embed → insert → query. You generate a vector externally, insert() or upsert() it into an index by ID with optional metadata, and later query() the index with a fresh vector to get back the topK nearest matches, each with a similarity score, and optionally their metadata and original values.

Worked example

Create an index sized for the Workers AI bge-base-en-v1.5 embedding model, which outputs 768-dimensional vectors:

npx wrangler vectorize create product-search --dimensions=768 --metric=cosine

# Enable filtering on a metadata field before you rely on it in queries
npx wrangler vectorize create-metadata-index product-search --property-name=category --type=string

Bind the index (and Workers AI) in wrangler.toml:

[ai]
binding = "AI"

[[vectorize]]
binding = "VECTORIZE"
index_name = "product-search"

Embed and insert a product description:

export interface Env {
  AI: Ai;
  VECTORIZE: Vectorize;
}

export default {
  async fetch(req: Request, env: Env) {
    const { id, text, category } = await req.json<{
      id: string; text: string; category: string;
    }>();

    const { data } = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
      text: [text],
    });

    await env.VECTORIZE.insert([
      {
        id,
        values: data[0], // 768 numbers — must match the index's --dimensions
        metadata: { category, text },
      },
    ]);

    return Response.json({ inserted: id });
  },
};

Query with a metadata filter — find products similar to a search phrase, restricted to one category:

const { data } = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["waterproof hiking boots"],
});

const results = await env.VECTORIZE.query(data[0], {
  topK: 5,
  filter: { category: "footwear" },
  returnMetadata: "all",
});

// results.matches: [{ id, score, metadata: { category, text } }, ...]
return Response.json(results.matches);

Pricing

Vectorize bills on two dimensions, not on storage size, CPU, or index count:

Metric	Workers Free	Workers Paid
Stored dimensions	5 million total	10 million included, then $0.05 per 100 million
Queried dimensions	30 million/month	50 million/month included, then $0.01 per million

"Stored dimensions" is vector count × dimensions summed across your indexes — a 768-dimension index with 10,000 vectors uses 7.68 million stored dimensions. "Queried dimensions" is charged per query as roughly (vectors scanned + 1) × dimensions, so higher-dimension embeddings and larger indexes both cost more per query. There's no charge for empty indexes, data transfer, or idle time. Confirm current numbers on the live pricing page below before quoting them — this is exactly the kind of figure that changes between plan revisions.

Use cases

Semantic search — match a user's query to documents/products by meaning, not just keyword overlap.
RAG pipelines — embed a knowledge base once, then at request time retrieve the most relevant chunks to stuff into an LLM prompt (pairs naturally with Workers AI and AI Gateway).
Recommendation systems — "more like this": embed items, and recommend nearest neighbors to whatever a user is currently viewing.
Deduplication / near-duplicate detection — embed incoming records and check whether a near-identical vector already exists before treating something as new.

Pitfall: index dimensions don't match your embedding model's output. Vectorize locks in dimensions and metric at create time and cannot change them afterward. If you create an index with --dimensions=1536 (an OpenAI text-embedding-3-small shape) but later switch to Workers AI's bge-base-en-v1.5 (768 dimensions), every insert() or query() call will fail outright — the vector length doesn't match what the index expects. There's no silent truncation or padding. The fix is to pick your embedding model first, confirm its output dimension in its model card, size the index to match, and if you ever change embedding models, create a new index (and re-embed everything) rather than trying to reuse the old one.

Primary source

The Vectorize documentation is the canonical reference for concepts and API shape; pair it with the Vectorize pricing page and limits page for current numbers, since both are subject to change.

You created a Vectorize index with --dimensions=1536 for an OpenAI embedding model, then decided to switch to a Workers AI model that outputs 768-dimensional vectors. What happens when you insert a 768-dimension vector into that index?

Without scrolling up: what are the two things Vectorize charges for, and what does "queried dimensions" actually scale with?

Reveal

Vectorize bills on stored dimensions (vector count × dimensions, summed across your indexes) and queried dimensions (roughly the number of vectors scanned per query × dimensions). There's no charge for CPU, index count, or idle storage — only actual stored and queried vector volume.

Anything above unclear — the dimension/metric lock-in, metadata filter syntax, or how Vectorize fits alongside Workers AI in a RAG pipeline? Ask your AI teacher before moving on.

← Previous: Put a control plane in front of your LLM calls Next: Run AI inference at the edge with Workers AI →