Vector Databases — The Infrastructure Behind RAG and Semantic Search
Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs Tags: vector-databases, embeddings, RAG, semantic-search, pinecone, weaviate, chroma, pgvector
What it is
A vector database is a specialised database for storing and searching embeddings — numerical representations of text, images, audio, or other data that capture meaning. When you build an AI application that needs to find “things similar to this,” a vector database is the infrastructure that makes it fast.
Vector databases are the plumbing behind Retrieval-Augmented Generation (RAG) — the technique that lets AI answer questions grounded in your specific documents. They’re also behind semantic search, recommendation systems, and any AI feature where similarity matters.
If you’re building AI applications beyond simple chat, you’ll almost certainly use one.
Why this matters
Traditional databases search by exact match: “find rows where name = ‘John’.” Vector databases search by similarity: “find documents most similar in meaning to this query.”
For example:
- User asks: “How do I cancel my subscription?”
- Your help docs might phrase it: “Membership termination process”
- Traditional keyword search: misses it (different words)
- Vector search: finds it (similar meaning)
This semantic understanding is what makes AI applications feel intelligent — they find relevant information even when phrased differently.
How vector search works (plain English)
The basic idea:
1. Embeddings
An embedding model (a kind of AI) converts text (or images, or audio) into a list of numbers — typically 1,536 numbers for text. This list is called a vector.
The clever part: embeddings are designed so that similar meanings produce similar vectors. Two sentences about cancelling subscriptions will have vectors close together; a sentence about pizza will have a very different vector.
See embeddings for the full explanation.
2. Storage
A vector database stores millions or billions of these vectors efficiently, along with the original text (or pointers to it) and any metadata you attach.
3. Search
When a user asks a question, you:
- Convert their question into a vector (using the same embedding model)
- Ask the vector database: “Find the 10 vectors most similar to this query vector”
- Get back the matching documents
4. Use with AI
You pass those matched documents to an AI (Claude, GPT, etc.) along with the user’s question. The AI answers using the relevant information you found.
This is RAG (Retrieval-Augmented Generation) — see rag.
The major vector databases
Pinecone
🇺🇸 United States | https://pinecone.io
The leading commercial vector database. Fully managed cloud service; no infrastructure to manage. Strong performance at scale, mature product, popular default choice for production applications.
Pricing: Free tier (limited); pay-per-use scaling; serverless and pod-based options.
Strengths: Mature, reliable, well-documented, hybrid search (vectors + keywords), strong ecosystem integration.
Weaknesses: Closed-source, US-hosted (data residency considerations for Australia), can get expensive at scale.
Weaviate
🇳🇱 Netherlands | https://weaviate.io
Open-source vector database with strong enterprise features. EU-origin (GDPR-aligned). Can run self-hosted or use Weaviate Cloud.
Pricing: Open-source free; managed cloud paid (from ~$25/month).
Strengths: Open-source, EU origin (good for privacy), modular architecture, hybrid search, GraphQL API.
Weaknesses: More complex than simpler alternatives; self-hosting requires expertise.
Chroma
🇺🇸 United States | https://trychroma.com
Open-source vector database focused on developer experience. Particularly popular for prototyping and smaller applications.
Pricing: Open-source free; Chroma Cloud (newer) paid.
Strengths: Very simple to start with, runs in-memory or on disk, excellent for prototyping, lightweight.
Weaknesses: Less mature for production at scale than Pinecone/Weaviate.
Qdrant
🇩🇪 Germany | https://qdrant.tech
Open-source, written in Rust (fast). Self-hosted or Qdrant Cloud.
Pricing: Open-source free; cloud from ~$25/month.
Strengths: Performance, open-source, EU origin, modern architecture.
Weaknesses: Smaller community than Pinecone/Weaviate.
Milvus / Zilliz
🇺🇸 / 🇨🇳 — Milvus is open-source (originally Chinese-founded but now widely-used globally); Zilliz is the commercial cloud version.
Note: Origin considerations apply — Zilliz Cloud is generally fine but verify current data handling for sensitive use cases.
Strengths: Mature, large-scale capable, multi-cloud.
pgvector
Extension to PostgreSQL | https://github.com/pgvector/pgvector
A PostgreSQL extension that adds vector search to standard Postgres. Significant because most applications already use Postgres — you don’t need a separate vector database.
Pricing: Free (it’s a Postgres extension); pay for whatever Postgres you use.
Strengths: Use your existing database, transactional consistency, mature operational tools, available on Supabase / managed Postgres.
Weaknesses: May not scale to billions of vectors as well as specialised databases; slower for very large workloads.
Australian relevance: Available on Supabase Australia regions; AWS RDS Sydney; Azure Postgres Australia East.
Redis Vector Search
Vector capabilities added to Redis. If you already use Redis for caching, you can use it for vector search too.
Elasticsearch / OpenSearch
Search platforms with vector capabilities added. Good if you already use them for keyword search.
Vertex AI Vector Search (Google)
🇺🇸 | Managed vector search on Google Cloud. Available in Sydney region.
Azure AI Search
🇺🇸 | Vector search on Azure. Australia East region.
Amazon OpenSearch Service
🇺🇸 | Vector search on AWS. Sydney region.
How to choose
Start with pgvector if you already use Postgres
Simplest path. Adequate for most applications. Stays in your existing infrastructure.
Pinecone if you want managed and don’t want to think about it
Easiest production path. Mature. Accept the US hosting and per-use pricing.
Weaviate or Qdrant if you want open-source with EU origin
Good privacy posture. Self-host for full control or use their managed cloud.
Chroma for prototyping
Easiest to get started. Move to something else for production if scale matters.
Hyperscale cloud’s offering if you’re already there
If you’re committed to AWS / Azure / GCP, their native vector search is usually fine and simplifies your stack.
Privacy considerations for Australian users
Vector databases store your data:
- Document content (or pointers to it)
- Embeddings (which can in some cases be reverse-engineered to approximate the original text)
- Metadata you attach
For sensitive Australian use:
Best privacy
- pgvector on Supabase Sydney or self-hosted Postgres on AWS Sydney
- Qdrant or Weaviate self-hosted on Australian infrastructure
- Local development with Chroma
Good privacy
- Vertex AI Vector Search (Sydney region)
- Azure AI Search (Australia East)
- AWS OpenSearch (Sydney)
Standard
- Pinecone, Chroma Cloud, Weaviate Cloud (US/EU hosted)
- Verify enterprise DPA for sensitive data
Avoid for sensitive AU data
- Chinese cloud vector services (encyclopedia recommendation aligned)
- Free tiers with unclear data handling
Typical costs
For a typical small-to-medium application:
| Stage | Approximate cost |
|---|---|
| Prototyping (Chroma local) | $0 |
| Small production (pgvector on Supabase free tier) | $0-25/month |
| Growing app (Pinecone serverless) | $50-200/month |
| Larger app (dedicated Pinecone pods or self-hosted) | $200-1,000+/month |
| Enterprise scale | Custom enterprise pricing |
Hidden costs:
- Embedding generation (OpenAI/Voyage embeddings cost per token)
- Storage scales with your document corpus
- Query volume affects pricing
Common patterns
Simple Q&A on documents
- Load documents
- Split into chunks (~500-1000 tokens)
- Generate embeddings for each chunk
- Store in vector database
- On query: embed query, find similar chunks, pass to LLM
Semantic search
Same as above, but return matching documents rather than AI-generated answers.
Hybrid search
Combine vector similarity (semantic) with keyword matching (lexical). Often better than either alone. Most major vector DBs support this.
Multi-modal search
Embed images and text in same space. Search for “images similar to this description” or “documents related to this image.”
Recommendations
Embed user preferences and items. Find items similar to what users liked.
Embedding models matter
The vector database is only as good as the embeddings you put in:
- OpenAI text-embedding-3-small — cheap, fast, widely-used default
- OpenAI text-embedding-3-large — better quality, more expensive
- Voyage AI embeddings — excellent quality, specialised for retrieval
- Cohere Embed v3 — strong multilingual support
- Open-weights (e.g., BAAI/bge-large) — free, run locally
Different embedding models produce different-quality results. Test for your use case.
Common gotchas
- Embedding dimensions must match — once you’ve embedded with one model, you can’t switch without re-embedding everything.
- Chunk size matters — too small loses context; too large dilutes meaning. Typically 500-1500 tokens.
- Metadata filters are important — pure semantic search isn’t enough; usually combine with metadata (date ranges, document types, etc.).
- Hybrid search often beats pure vector — keyword + vector combined.
- Re-indexing is expensive — design your schema thoughtfully upfront.
- Embeddings cost real money at scale — millions of documents add up.
- Quality matters more than quantity — clean, well-structured documents beat noisy data.
- Vector search is approximate — uses approximate nearest neighbour algorithms; “the most similar” might not be exact.
When you DON’T need a vector database
Vector databases are infrastructure. Don’t add them unnecessarily:
- Small document set (under ~1,000 chunks) — Just put it in your LLM prompt directly
- Exact matching needed — Use a regular database
- Frequently updated content — Vector index maintenance has overhead
- Very specific queries — Keyword search may suffice
Adding a vector database adds complexity. Use it when its benefits exceed its cost.
Australian usage
In Australia, vector databases are commonly used:
- In AI consultancies building client RAG systems
- In Australian SaaS products with AI features
- In government AI pilots (with appropriate data residency)
- In Australian universities for research applications
- In healthcare and legal AI (with appropriate sensitive data handling)
For Australian developers: pgvector + Supabase Sydney is often the simplest path.
Recent changes (LIVING)
- Serverless pricing at Pinecone — easier scaling
- pgvector improvements — substantial performance gains
- Hybrid search becoming standard across providers
- More multi-modal capabilities (images alongside text)
- Better integration with LangChain and LlamaIndex
- Cloud providers’ native offerings maturing
See also
- embeddings — the underlying numerical representations
- rag — the primary use case
- langchain — orchestrates vector DBs in apps
- hugging-face — source of embedding models
- supabase — provides pgvector
- aws-bedrock — AWS RAG infrastructure
- vertex-ai — GCP RAG infrastructure
- azure-openai-service — Azure RAG
- open-weights-vs-closed — for local embedding models
- australian-privacy-considerations
Sources
- Pinecone, Weaviate, Chroma, Qdrant, Milvus official documentation
- pgvector GitHub and documentation
- Industry analyses of vector database adoption (2024-2026)
- Personal experience building RAG applications
- Developer community discussions (Hacker News, r/MachineLearning)
- Australian developer community AI infrastructure discussions