Vector Databases — The Infrastructure Behind RAG and Semantic Search

Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs Tags: vector-databases, embeddings, RAG, semantic-search, pinecone, weaviate, chroma, pgvector


What it is

A vector database is a specialised database for storing and searching embeddings — numerical representations of text, images, audio, or other data that capture meaning. When you build an AI application that needs to find “things similar to this,” a vector database is the infrastructure that makes it fast.

Vector databases are the plumbing behind Retrieval-Augmented Generation (RAG) — the technique that lets AI answer questions grounded in your specific documents. They’re also behind semantic search, recommendation systems, and any AI feature where similarity matters.

If you’re building AI applications beyond simple chat, you’ll almost certainly use one.


Why this matters

Traditional databases search by exact match: “find rows where name = ‘John’.” Vector databases search by similarity: “find documents most similar in meaning to this query.”

For example:

  • User asks: “How do I cancel my subscription?”
  • Your help docs might phrase it: “Membership termination process”
  • Traditional keyword search: misses it (different words)
  • Vector search: finds it (similar meaning)

This semantic understanding is what makes AI applications feel intelligent — they find relevant information even when phrased differently.


How vector search works (plain English)

The basic idea:

1. Embeddings

An embedding model (a kind of AI) converts text (or images, or audio) into a list of numbers — typically 1,536 numbers for text. This list is called a vector.

The clever part: embeddings are designed so that similar meanings produce similar vectors. Two sentences about cancelling subscriptions will have vectors close together; a sentence about pizza will have a very different vector.

See embeddings for the full explanation.

2. Storage

A vector database stores millions or billions of these vectors efficiently, along with the original text (or pointers to it) and any metadata you attach.

When a user asks a question, you:

  • Convert their question into a vector (using the same embedding model)
  • Ask the vector database: “Find the 10 vectors most similar to this query vector”
  • Get back the matching documents

4. Use with AI

You pass those matched documents to an AI (Claude, GPT, etc.) along with the user’s question. The AI answers using the relevant information you found.

This is RAG (Retrieval-Augmented Generation) — see rag.


The major vector databases

Pinecone

🇺🇸 United States | https://pinecone.io

The leading commercial vector database. Fully managed cloud service; no infrastructure to manage. Strong performance at scale, mature product, popular default choice for production applications.

Pricing: Free tier (limited); pay-per-use scaling; serverless and pod-based options.

Strengths: Mature, reliable, well-documented, hybrid search (vectors + keywords), strong ecosystem integration.

Weaknesses: Closed-source, US-hosted (data residency considerations for Australia), can get expensive at scale.

Weaviate

🇳🇱 Netherlands | https://weaviate.io

Open-source vector database with strong enterprise features. EU-origin (GDPR-aligned). Can run self-hosted or use Weaviate Cloud.

Pricing: Open-source free; managed cloud paid (from ~$25/month).

Strengths: Open-source, EU origin (good for privacy), modular architecture, hybrid search, GraphQL API.

Weaknesses: More complex than simpler alternatives; self-hosting requires expertise.

Chroma

🇺🇸 United States | https://trychroma.com

Open-source vector database focused on developer experience. Particularly popular for prototyping and smaller applications.

Pricing: Open-source free; Chroma Cloud (newer) paid.

Strengths: Very simple to start with, runs in-memory or on disk, excellent for prototyping, lightweight.

Weaknesses: Less mature for production at scale than Pinecone/Weaviate.

Qdrant

🇩🇪 Germany | https://qdrant.tech

Open-source, written in Rust (fast). Self-hosted or Qdrant Cloud.

Pricing: Open-source free; cloud from ~$25/month.

Strengths: Performance, open-source, EU origin, modern architecture.

Weaknesses: Smaller community than Pinecone/Weaviate.

Milvus / Zilliz

🇺🇸 / 🇨🇳 — Milvus is open-source (originally Chinese-founded but now widely-used globally); Zilliz is the commercial cloud version.

Note: Origin considerations apply — Zilliz Cloud is generally fine but verify current data handling for sensitive use cases.

Strengths: Mature, large-scale capable, multi-cloud.

pgvector

Extension to PostgreSQL | https://github.com/pgvector/pgvector

A PostgreSQL extension that adds vector search to standard Postgres. Significant because most applications already use Postgres — you don’t need a separate vector database.

Pricing: Free (it’s a Postgres extension); pay for whatever Postgres you use.

Strengths: Use your existing database, transactional consistency, mature operational tools, available on Supabase / managed Postgres.

Weaknesses: May not scale to billions of vectors as well as specialised databases; slower for very large workloads.

Australian relevance: Available on Supabase Australia regions; AWS RDS Sydney; Azure Postgres Australia East.

Vector capabilities added to Redis. If you already use Redis for caching, you can use it for vector search too.

Elasticsearch / OpenSearch

Search platforms with vector capabilities added. Good if you already use them for keyword search.

Vertex AI Vector Search (Google)

🇺🇸 | Managed vector search on Google Cloud. Available in Sydney region.

🇺🇸 | Vector search on Azure. Australia East region.

Amazon OpenSearch Service

🇺🇸 | Vector search on AWS. Sydney region.


How to choose

Start with pgvector if you already use Postgres

Simplest path. Adequate for most applications. Stays in your existing infrastructure.

Pinecone if you want managed and don’t want to think about it

Easiest production path. Mature. Accept the US hosting and per-use pricing.

Weaviate or Qdrant if you want open-source with EU origin

Good privacy posture. Self-host for full control or use their managed cloud.

Chroma for prototyping

Easiest to get started. Move to something else for production if scale matters.

Hyperscale cloud’s offering if you’re already there

If you’re committed to AWS / Azure / GCP, their native vector search is usually fine and simplifies your stack.


Privacy considerations for Australian users

Vector databases store your data:

  • Document content (or pointers to it)
  • Embeddings (which can in some cases be reverse-engineered to approximate the original text)
  • Metadata you attach

For sensitive Australian use:

Best privacy

  • pgvector on Supabase Sydney or self-hosted Postgres on AWS Sydney
  • Qdrant or Weaviate self-hosted on Australian infrastructure
  • Local development with Chroma

Good privacy

  • Vertex AI Vector Search (Sydney region)
  • Azure AI Search (Australia East)
  • AWS OpenSearch (Sydney)

Standard

  • Pinecone, Chroma Cloud, Weaviate Cloud (US/EU hosted)
  • Verify enterprise DPA for sensitive data

Avoid for sensitive AU data

  • Chinese cloud vector services (encyclopedia recommendation aligned)
  • Free tiers with unclear data handling

Typical costs

For a typical small-to-medium application:

StageApproximate cost
Prototyping (Chroma local)$0
Small production (pgvector on Supabase free tier)$0-25/month
Growing app (Pinecone serverless)$50-200/month
Larger app (dedicated Pinecone pods or self-hosted)$200-1,000+/month
Enterprise scaleCustom enterprise pricing

Hidden costs:

  • Embedding generation (OpenAI/Voyage embeddings cost per token)
  • Storage scales with your document corpus
  • Query volume affects pricing

Common patterns

Simple Q&A on documents

  1. Load documents
  2. Split into chunks (~500-1000 tokens)
  3. Generate embeddings for each chunk
  4. Store in vector database
  5. On query: embed query, find similar chunks, pass to LLM

Same as above, but return matching documents rather than AI-generated answers.

Combine vector similarity (semantic) with keyword matching (lexical). Often better than either alone. Most major vector DBs support this.

Embed images and text in same space. Search for “images similar to this description” or “documents related to this image.”

Recommendations

Embed user preferences and items. Find items similar to what users liked.


Embedding models matter

The vector database is only as good as the embeddings you put in:

  • OpenAI text-embedding-3-small — cheap, fast, widely-used default
  • OpenAI text-embedding-3-large — better quality, more expensive
  • Voyage AI embeddings — excellent quality, specialised for retrieval
  • Cohere Embed v3 — strong multilingual support
  • Open-weights (e.g., BAAI/bge-large) — free, run locally

Different embedding models produce different-quality results. Test for your use case.


Common gotchas

  • Embedding dimensions must match — once you’ve embedded with one model, you can’t switch without re-embedding everything.
  • Chunk size matters — too small loses context; too large dilutes meaning. Typically 500-1500 tokens.
  • Metadata filters are important — pure semantic search isn’t enough; usually combine with metadata (date ranges, document types, etc.).
  • Hybrid search often beats pure vector — keyword + vector combined.
  • Re-indexing is expensive — design your schema thoughtfully upfront.
  • Embeddings cost real money at scale — millions of documents add up.
  • Quality matters more than quantity — clean, well-structured documents beat noisy data.
  • Vector search is approximate — uses approximate nearest neighbour algorithms; “the most similar” might not be exact.

When you DON’T need a vector database

Vector databases are infrastructure. Don’t add them unnecessarily:

  • Small document set (under ~1,000 chunks) — Just put it in your LLM prompt directly
  • Exact matching needed — Use a regular database
  • Frequently updated content — Vector index maintenance has overhead
  • Very specific queries — Keyword search may suffice

Adding a vector database adds complexity. Use it when its benefits exceed its cost.


Australian usage

In Australia, vector databases are commonly used:

  • In AI consultancies building client RAG systems
  • In Australian SaaS products with AI features
  • In government AI pilots (with appropriate data residency)
  • In Australian universities for research applications
  • In healthcare and legal AI (with appropriate sensitive data handling)

For Australian developers: pgvector + Supabase Sydney is often the simplest path.


Recent changes (LIVING)

  • Serverless pricing at Pinecone — easier scaling
  • pgvector improvements — substantial performance gains
  • Hybrid search becoming standard across providers
  • More multi-modal capabilities (images alongside text)
  • Better integration with LangChain and LlamaIndex
  • Cloud providers’ native offerings maturing

See also


Sources

  • Pinecone, Weaviate, Chroma, Qdrant, Milvus official documentation
  • pgvector GitHub and documentation
  • Industry analyses of vector database adoption (2024-2026)
  • Personal experience building RAG applications
  • Developer community discussions (Hacker News, r/MachineLearning)
  • Australian developer community AI infrastructure discussions