Embeddings
Status: đźź© COMPLETE Last updated: 2026-06-19 Plain-English tagline: Turning text (or images, or audio) into a list of numbers that captures meaning. The foundation under RAG, semantic search, recommendations, and a lot more.
In plain English
An embedding is a piece of text (or an image, or anything else) represented as a fixed-size list of numbers — typically 768, 1024, 1536, or 3072 floating-point values. That list is called a vector.
The trick: an embedding model is trained so that pieces of text with similar meaning produce vectors that are mathematically close to each other in the multi-dimensional space those numbers live in.
So:
"how do I deploy to Vercel?"and"steps to push my site live on Vercel"produce vectors that are close"how do I deploy to Vercel?"and"what is the capital of France?"produce vectors that are far apart
That single property — “similar meaning → close vectors” — unlocks an enormous range of applications. Search, recommendations, clustering, deduplication, classification, RAG, all built on top of “compute embeddings, compare distances.”
Why it matters
Three reasons:
-
Embeddings are how modern semantic search works. “Find me docs about X” is “embed X, find the nearest stored embeddings.” This is the basis of RAG.
-
They’re cheap. Computing an embedding is way cheaper than running a full LLM. You can embed millions of docs once and search them forever.
-
They generalize. The same technique works for text, images, audio, code, mixed media — anywhere you can produce a vector representation, you can use the same comparison tools.
How embeddings actually work (conceptually)
An embedding model is a neural network. Input: a piece of text. Output: a vector.
During training, the model is shown pairs of text and told which pairs are “similar” (e.g. a question and its correct answer, two paraphrases of the same sentence, a query and a relevant document). The model adjusts its parameters until similar pairs produce close vectors and dissimilar pairs produce far ones.
At the end of training, you have a model that — given any text it’s never seen before — can produce a vector whose position in space is meaningful: text that’s similar to other examples produces similar vectors.
You don’t have to think about what each of the 1536 numbers in the vector represents. The model has learned a useful representation; the numbers themselves are uninterpretable to humans.
What “close” means — similarity metrics
To compare two vectors, you use a similarity metric:
Cosine similarity (the standard)
Measures the angle between two vectors. Range: -1 to 1. Closer to 1 = more similar. Ignores vector magnitude (only direction matters). Most embedding models are trained for this.
cosine_similarity(a, b) = (a · b) / (||a|| × ||b||)
Euclidean (L2) distance
Straight-line distance between the tips of the vectors. Often used for normalized embeddings (where magnitude is constant), in which case it produces the same ranking as cosine.
Dot product
Sum of element-wise products. Considers both direction and magnitude. Faster to compute than cosine; sometimes preferred for indexing.
For RAG and semantic search, cosine is the default and you rarely need to think about it. Vector databases handle the math; you just store and query.
A concrete example
Imagine a 3-dimensional embedding space (real ones are hundreds of dimensions, but 3D is visualizable):
| Text | Vector |
|---|---|
| ”cat” | [0.9, 0.1, 0.2] |
| “kitten” | [0.85, 0.15, 0.18] |
| “dog” | [0.7, 0.3, 0.2] |
| “Italian food” | [-0.2, 0.8, 0.1] |
| “pasta recipes” | [-0.25, 0.85, 0.05] |
Notice:
catandkittenare very close — they’re about the same conceptcatanddogare reasonably close — both animalsItalian foodandpasta recipesare very close — same topiccatandpasta recipesare far — totally different topics
Real embeddings work the same way but with 768+ dimensions, so they can capture much more nuance — domain, tone, formality, entities, time period, all overlapping in the high-dimensional space.
Embedding models you’ll actually use
The main offerings in mid-2026:
| Provider | Model | Dimensions | Notes |
|---|---|---|---|
| OpenAI | text-embedding-3-large | 3072 | Strong general-purpose; can be truncated to lower dims |
| OpenAI | text-embedding-3-small | 1536 | Cheaper, good for most uses |
| Cohere | embed-v3 | 1024 | Strong multilingual; competitive quality |
| Anthropic | (uses Voyage AI under the hood) | varies | Anthropic-recommended for Claude pipelines |
| Voyage AI | voyage-3-large, voyage-code-3 | 1024 | Excellent for code and English |
| Open-source | nomic-embed-text-v2, bge-large | 768–1024 | Free to self-host; competitive on benchmarks |
Quality varies but for most uses, any modern model from this list works. Differences become noticeable at scale or in specialized domains (code, legal, medical).
What embeddings are used for
Semantic search
“Find docs similar to this query.” The basis of RAG, enterprise search, customer support knowledge bases.
Deduplication
Cluster similar items (near-identical vectors) and keep one. Useful for cleaning datasets of news articles, comments, support tickets.
Recommendations
“Users who liked X also liked Y” — embed items, find nearest neighbors. Much smaller and cheaper than traditional collaborative filtering for many use cases.
Classification
Embed your inputs once, train a tiny classifier on top of the embeddings. Often outperforms training a model from scratch.
Clustering and topic modeling
Group similar items without labels. Useful for surfacing themes in user feedback, organizing large document collections.
Anomaly detection
Items whose embeddings are far from any cluster are unusual. Useful for spam detection, content moderation triage.
Multi-modal search
Embedding models exist that embed both images and text into the same space (CLIP, SigLIP). You can search images by text query, or vice versa.
Vector databases — where embeddings live
You don’t typically search embeddings yourself. You store them in a vector database that’s optimized for fast nearest-neighbor search.
Common choices:
| Option | When to use |
|---|---|
| pgvector (Postgres extension) | Default for solo devs / small teams. Already running Postgres? Just add the extension. Works great in Supabase. |
| Pinecone | Hosted, easy to start, scales well. Costs more than self-hosted. |
| Weaviate | Open source + hosted. Strong feature set. |
| Chroma | Lightweight, often used locally for development. |
| Qdrant | Fast, Rust-based, good open-source option. |
| Milvus | Mature, used at scale. |
For your stack (Supabase + Next.js + Vercel), pgvector inside Supabase is the natural choice — same database, no new service, no new bill, full SQL access alongside vector ops.
How embeddings cost
Embedding APIs are priced per million tokens, much cheaper than LLM inference:
- OpenAI
text-embedding-3-small: ~$0.02 per 1M tokens - OpenAI
text-embedding-3-large: ~$0.13 per 1M tokens - Voyage AI: similar order of magnitude
Embedding 1 million tokens of content costs cents. Embedding a 1000-page book costs maybe $0.10. You can ingest large corpora cheaply.
One-time cost: embed everything once → store the vectors. You pay nothing afterward to search them (just the vector DB’s storage cost).
Common gotchas
-
Don’t mix embedding models. Vectors from
text-embedding-3-largeare not comparable to vectors fromtext-embedding-3-small. If you change models, re-embed everything. -
Vector dimension matters. Higher-dimension vectors store more info but cost more in storage and search time. For most uses, 1024 or 1536 is plenty.
-
Chunk size for embeddings. Most embedding models have a max input size (often ~8000 tokens). For RAG, you usually want shorter chunks (~200–800 tokens) anyway, both for retrieval quality and to fit more in the LLM’s context window.
-
Embedding queries vs documents. Some models (like OpenAI’s) treat queries and documents the same way. Others (like Cohere’s
embed-v3) have separate “input_type” parameters. Use them — quality matters. -
Token vs character limits. Embedding limits are usually in tokens, not characters. Long text without enough chunking gets silently truncated.
-
Embeddings are deterministic per model version. Same input → same output. But model versions change. Pin the model version if you care about reproducibility.
-
Cosine similarity scores aren’t absolute. A score of 0.85 doesn’t mean “85% similar” in any human sense. It means “high relative to other distances in this space.” Calibrate by example, not by absolute threshold.
-
Lost in similarity. Two unrelated docs can have high cosine similarity by coincidence (especially in low-dimensional spaces). Hybrid search (vectors + keywords) and re-ranking guard against this.
-
Sensitive data in embeddings. Embeddings are derived from text. While not directly reversible to the original (you can’t decode an embedding back to its source text without an inversion attack), they leak information. Treat them as semi-sensitive — don’t expose embeddings of private data publicly.
See also
- RAG 🟩 — the most common use of embeddings
- What is an LLM? đźź©
- How LLMs work 🟩 — transformer internals
- Tokens & context windows đźź©
- The Claude API 🟩 🟦
- Multimodal (vision, audio) 🟩 🟦 — multimodal embeddings (CLIP)
- Postgres 🟩 🟦 — pgvector extension
- Supabase 🟩 🟦 — pgvector lives here
- What is a database? đźź©
- Glossary: Embedding, Vector
Sources
- OpenAI — Embeddings guide
- Cohere — Embed v3 announcement
- Massive Text Embedding Benchmark (MTEB) — current rankings of embedding models
- pgvector docs
- Supabase Vector docs
- Vector Databases comparison (Pinecone blog)