Embeddings

Status: đźź© COMPLETE Last updated: 2026-06-19 Plain-English tagline: Turning text (or images, or audio) into a list of numbers that captures meaning. The foundation under RAG, semantic search, recommendations, and a lot more.


In plain English

An embedding is a piece of text (or an image, or anything else) represented as a fixed-size list of numbers — typically 768, 1024, 1536, or 3072 floating-point values. That list is called a vector.

The trick: an embedding model is trained so that pieces of text with similar meaning produce vectors that are mathematically close to each other in the multi-dimensional space those numbers live in.

So:

  • "how do I deploy to Vercel?" and "steps to push my site live on Vercel" produce vectors that are close
  • "how do I deploy to Vercel?" and "what is the capital of France?" produce vectors that are far apart

That single property — “similar meaning → close vectors” — unlocks an enormous range of applications. Search, recommendations, clustering, deduplication, classification, RAG, all built on top of “compute embeddings, compare distances.”


Why it matters

Three reasons:

  1. Embeddings are how modern semantic search works. “Find me docs about X” is “embed X, find the nearest stored embeddings.” This is the basis of RAG.

  2. They’re cheap. Computing an embedding is way cheaper than running a full LLM. You can embed millions of docs once and search them forever.

  3. They generalize. The same technique works for text, images, audio, code, mixed media — anywhere you can produce a vector representation, you can use the same comparison tools.


How embeddings actually work (conceptually)

An embedding model is a neural network. Input: a piece of text. Output: a vector.

During training, the model is shown pairs of text and told which pairs are “similar” (e.g. a question and its correct answer, two paraphrases of the same sentence, a query and a relevant document). The model adjusts its parameters until similar pairs produce close vectors and dissimilar pairs produce far ones.

At the end of training, you have a model that — given any text it’s never seen before — can produce a vector whose position in space is meaningful: text that’s similar to other examples produces similar vectors.

You don’t have to think about what each of the 1536 numbers in the vector represents. The model has learned a useful representation; the numbers themselves are uninterpretable to humans.


What “close” means — similarity metrics

To compare two vectors, you use a similarity metric:

Cosine similarity (the standard)

Measures the angle between two vectors. Range: -1 to 1. Closer to 1 = more similar. Ignores vector magnitude (only direction matters). Most embedding models are trained for this.

cosine_similarity(a, b) = (a · b) / (||a|| × ||b||)

Euclidean (L2) distance

Straight-line distance between the tips of the vectors. Often used for normalized embeddings (where magnitude is constant), in which case it produces the same ranking as cosine.

Dot product

Sum of element-wise products. Considers both direction and magnitude. Faster to compute than cosine; sometimes preferred for indexing.

For RAG and semantic search, cosine is the default and you rarely need to think about it. Vector databases handle the math; you just store and query.


A concrete example

Imagine a 3-dimensional embedding space (real ones are hundreds of dimensions, but 3D is visualizable):

TextVector
”cat”[0.9, 0.1, 0.2]
“kitten”[0.85, 0.15, 0.18]
“dog”[0.7, 0.3, 0.2]
“Italian food”[-0.2, 0.8, 0.1]
“pasta recipes”[-0.25, 0.85, 0.05]

Notice:

  • cat and kitten are very close — they’re about the same concept
  • cat and dog are reasonably close — both animals
  • Italian food and pasta recipes are very close — same topic
  • cat and pasta recipes are far — totally different topics

Real embeddings work the same way but with 768+ dimensions, so they can capture much more nuance — domain, tone, formality, entities, time period, all overlapping in the high-dimensional space.


Embedding models you’ll actually use

The main offerings in mid-2026:

ProviderModelDimensionsNotes
OpenAItext-embedding-3-large3072Strong general-purpose; can be truncated to lower dims
OpenAItext-embedding-3-small1536Cheaper, good for most uses
Cohereembed-v31024Strong multilingual; competitive quality
Anthropic(uses Voyage AI under the hood)variesAnthropic-recommended for Claude pipelines
Voyage AIvoyage-3-large, voyage-code-31024Excellent for code and English
Open-sourcenomic-embed-text-v2, bge-large768–1024Free to self-host; competitive on benchmarks

Quality varies but for most uses, any modern model from this list works. Differences become noticeable at scale or in specialized domains (code, legal, medical).


What embeddings are used for

“Find docs similar to this query.” The basis of RAG, enterprise search, customer support knowledge bases.

Deduplication

Cluster similar items (near-identical vectors) and keep one. Useful for cleaning datasets of news articles, comments, support tickets.

Recommendations

“Users who liked X also liked Y” — embed items, find nearest neighbors. Much smaller and cheaper than traditional collaborative filtering for many use cases.

Classification

Embed your inputs once, train a tiny classifier on top of the embeddings. Often outperforms training a model from scratch.

Clustering and topic modeling

Group similar items without labels. Useful for surfacing themes in user feedback, organizing large document collections.

Anomaly detection

Items whose embeddings are far from any cluster are unusual. Useful for spam detection, content moderation triage.

Embedding models exist that embed both images and text into the same space (CLIP, SigLIP). You can search images by text query, or vice versa.


Vector databases — where embeddings live

You don’t typically search embeddings yourself. You store them in a vector database that’s optimized for fast nearest-neighbor search.

Common choices:

OptionWhen to use
pgvector (Postgres extension)Default for solo devs / small teams. Already running Postgres? Just add the extension. Works great in Supabase.
PineconeHosted, easy to start, scales well. Costs more than self-hosted.
WeaviateOpen source + hosted. Strong feature set.
ChromaLightweight, often used locally for development.
QdrantFast, Rust-based, good open-source option.
MilvusMature, used at scale.

For your stack (Supabase + Next.js + Vercel), pgvector inside Supabase is the natural choice — same database, no new service, no new bill, full SQL access alongside vector ops.


How embeddings cost

Embedding APIs are priced per million tokens, much cheaper than LLM inference:

  • OpenAI text-embedding-3-small: ~$0.02 per 1M tokens
  • OpenAI text-embedding-3-large: ~$0.13 per 1M tokens
  • Voyage AI: similar order of magnitude

Embedding 1 million tokens of content costs cents. Embedding a 1000-page book costs maybe $0.10. You can ingest large corpora cheaply.

One-time cost: embed everything once → store the vectors. You pay nothing afterward to search them (just the vector DB’s storage cost).


Common gotchas

  • Don’t mix embedding models. Vectors from text-embedding-3-large are not comparable to vectors from text-embedding-3-small. If you change models, re-embed everything.

  • Vector dimension matters. Higher-dimension vectors store more info but cost more in storage and search time. For most uses, 1024 or 1536 is plenty.

  • Chunk size for embeddings. Most embedding models have a max input size (often ~8000 tokens). For RAG, you usually want shorter chunks (~200–800 tokens) anyway, both for retrieval quality and to fit more in the LLM’s context window.

  • Embedding queries vs documents. Some models (like OpenAI’s) treat queries and documents the same way. Others (like Cohere’s embed-v3) have separate “input_type” parameters. Use them — quality matters.

  • Token vs character limits. Embedding limits are usually in tokens, not characters. Long text without enough chunking gets silently truncated.

  • Embeddings are deterministic per model version. Same input → same output. But model versions change. Pin the model version if you care about reproducibility.

  • Cosine similarity scores aren’t absolute. A score of 0.85 doesn’t mean “85% similar” in any human sense. It means “high relative to other distances in this space.” Calibrate by example, not by absolute threshold.

  • Lost in similarity. Two unrelated docs can have high cosine similarity by coincidence (especially in low-dimensional spaces). Hybrid search (vectors + keywords) and re-ranking guard against this.

  • Sensitive data in embeddings. Embeddings are derived from text. While not directly reversible to the original (you can’t decode an embedding back to its source text without an inversion attack), they leak information. Treat them as semi-sensitive — don’t expose embeddings of private data publicly.


See also


Sources