Indexes & performance

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: Indexes are how databases find rows fast. The single biggest lever for query performance. Like the index in a book — without it, you read every page; with it, you jump straight to what you need.


In plain English

Without indexes, finding a specific row in a table means scanning every row in order until you find it (or finish). For a 100-row table, that’s instant. For a 10-million-row table, that’s seconds. Multiply by every query in your app, and your database grinds.

An index is a separate data structure the database maintains alongside the table. It’s organized so the database can find specific values fast — in milliseconds, not seconds, even on huge tables.

Think of a book’s index. The book itself has chapters and pages in narrative order. The index at the back is alphabetical, with page numbers. To find “Postgres” you don’t read the whole book — you flip to “P” in the index, find “Postgres … 47,” go to page 47.

Databases do the same. The table is the book; the index is the index. You create indexes on the columns you query.

For most webapps, the right indexes are the difference between “feels instant” and “loads slowly under any load.” Missing indexes are the #1 source of unexpected slowness in real systems.


Why it matters

  • Performance ceiling. No amount of caching or scaling rescues a slow query against an unindexed table.
  • It’s free to fix. Adding an index is one SQL command. The right index turns 5-second queries into 5-millisecond ones.
  • Hard to add later under load. Creating indexes on huge tables can lock them briefly. Easier to add before scale.
  • Cost of having too many. Each index slows writes and uses disk. The art is in picking the right set.

When the database uses an index

The database’s query planner decides per-query: “should I scan the whole table, or use this index?” The decision depends on:

  • Is there a usable index for the query’s WHERE / JOIN conditions?
  • Is the index efficient enough vs. just scanning?
  • How current are the table statistics (which guide the planner)?

You can see the decision with EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM posts WHERE author_id = '...';

Output includes things like:

  • Seq Scan — full table scan (bad on large tables, fine on small)
  • Index Scan — using an index (usually good)
  • Index Only Scan — answered from index alone (very good)
  • Bitmap Heap Scan — combining multiple indexes (also good)

If you see Seq Scan on a large table for a query you’d expect to use an index, you’re missing an index or have a planner issue.


Index types (Postgres)

Postgres supports many index types. The defaults handle most cases:

B-tree (the default)

CREATE INDEX posts_created_idx ON posts(created_at);

Excellent for: equality (=), range queries (<, >, BETWEEN), sorting (ORDER BY). Used for ~90% of indexes.

Hash

CREATE INDEX users_email_hash_idx ON users USING HASH (email);

Faster than B-tree for pure equality lookups. Rarely worth it; B-tree is general-purpose enough.

GIN (Generalized Inverted Index)

CREATE INDEX posts_tags_idx ON posts USING GIN (tags);
CREATE INDEX posts_search_idx ON posts USING GIN (to_tsvector('english', body));

For: arrays, JSONB, full-text search. Slower to update but powerful for “contains” / “matches any of” queries.

GiST (Generalized Search Tree)

For geometric data (PostGIS), range types, fuzzy matching (pg_trgm).

BRIN (Block Range Index)

CREATE INDEX events_time_idx ON events USING BRIN (created_at);

Tiny index, fast on tables with natural physical ordering (time-series tables where rows are inserted in time order). Trade-off: less precise.

For most webapps with ordinary tables, B-tree covers 95% of cases. Reach for the others when specific needs (full-text search, JSON, time-series at scale) apply.


What to index

High-value indexes

  1. Primary keys — automatic.
  2. Foreign keys — Postgres does NOT auto-index foreign keys. Add them.
  3. Columns in WHERE clauses — anything you filter on regularly.
  4. Columns in JOIN ON clauses — anything two tables join on.
  5. Columns in ORDER BY — speeds sorting + can avoid sorting altogether.
  6. Unique columns — UNIQUE constraint creates an index automatically.

Composite indexes

When queries filter on multiple columns:

CREATE INDEX tasks_project_status_idx ON tasks(project_id, status);

Order matters. This index helps:

  • WHERE project_id = ?
  • WHERE project_id = ? AND status = ?

But NOT:

  • WHERE status = ? (need a separate index for that, or a different column order)

Rule: put the more selective column first, or the one most queries filter on first.

Partial indexes

When queries only care about a subset of rows:

CREATE INDEX posts_published_idx ON posts(published_at DESC)
  WHERE published_at IS NOT NULL;

Smaller index (only indexes published posts). Faster to maintain. Used for queries that match the WHERE clause.

Expression indexes

CREATE INDEX users_email_lower_idx ON users(lower(email));
 
-- Then query with:
SELECT * FROM users WHERE lower(email) = lower($1);

For: case-insensitive lookups, indexed JSON paths, computed values.


A concrete example: indexing for common queries

For our blog schema:

-- Common queries:
-- 1. SELECT * FROM posts WHERE author_id = ? ORDER BY created_at DESC LIMIT 10;
-- 2. SELECT * FROM posts WHERE published_at IS NOT NULL ORDER BY published_at DESC;
-- 3. SELECT * FROM comments WHERE post_id = ?;
-- 4. SELECT * FROM users WHERE email = ?;
 
-- Indexes:
CREATE INDEX posts_author_created_idx ON posts(author_id, created_at DESC);
CREATE INDEX posts_published_idx ON posts(published_at DESC) WHERE published_at IS NOT NULL;
CREATE INDEX comments_post_idx ON comments(post_id);
CREATE UNIQUE INDEX users_email_idx ON users(email);

Each query now has a matching index. Sub-millisecond lookups even on millions of rows.


What NOT to index

  • Tiny tables. A 50-row table is faster to scan than index-lookup. Postgres planner figures this out, but indexes are still wasted.
  • Columns with very few distinct values. A boolean is_active indexed alone is rarely useful (planner often skips it). Composite indexes with other columns can help.
  • Columns rarely used in queries. No query → no benefit → all cost.
  • Frequently-updated columns. Each update means updating the index. High-write columns with low-read benefit may not pay off.

The N+1 problem

Not strictly indexing, but the most common app-level performance bug.

// Bad: N+1
const posts = await db.posts.findMany();        // 1 query
for (const post of posts) {
  post.author = await db.users.find(post.author_id);  // N queries
}
// Total: N+1 queries
// Good: 1 query with JOIN (or 2 with batched IN)
const posts = await db.posts.findMany({ include: { author: true } });
// Or in raw SQL:
SELECT posts.*, users.* FROM posts JOIN users ON users.id = posts.author_id;

ORMs help with this via include / with patterns. Be aware.


EXPLAIN — your debugging tool

EXPLAIN SELECT * FROM posts WHERE author_id = '...';

Shows the planned execution. Useful but doesn’t run the query.

EXPLAIN ANALYZE SELECT * FROM posts WHERE author_id = '...';

Runs the query AND shows actual times. Use this on slow queries to see where time goes.

Things to look for:

  • Seq Scan on big tables — missing index
  • High Rows Removed by Filter — index not selective enough
  • Sort step — could be eliminated with an index in the right order
  • Many Loops — N+1 or correlated subquery
  • Hash Join vs Nested Loop vs Merge Join — different strategies; planner picks based on stats

For Supabase, you can run EXPLAIN ANALYZE in the SQL Editor.


Other performance levers

Connection pooling

Each DB connection has overhead. Apps that open a fresh connection per request exhaust the DB’s connection limit. Use a pooler (PgBouncer, Supabase’s built-in).

Query rewriting

Sometimes the fix isn’t an index but rewriting the query:

  • OR conditions → UNION
  • Correlated subqueries → joins
  • SELECT * → specific columns

Caching

Frequently-accessed data → in-memory cache (Redis) → avoid hitting DB.

Materialized views

Pre-compute expensive queries; refresh periodically:

CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) AS post_count
FROM posts
GROUP BY user_id;
 
REFRESH MATERIALIZED VIEW user_stats;  -- on schedule

Queries against user_stats are instant.

Partitioning

For huge tables (billions of rows), Postgres can split a logical table into many physical partitions. Specialized; rarely needed for solo / small-team projects.

Read replicas

For read-heavy workloads, replicas can serve queries while primary handles writes.


Vacuum and autovacuum

Postgres uses MVCC (multi-version concurrency control). Deleted/updated rows leave “dead tuples” that need cleanup via VACUUM.

autovacuum runs automatically. Mostly fine; in some heavy-write workloads needs tuning.

Without VACUUM, indexes bloat, queries slow, transaction IDs can wrap around (catastrophic). Monitor.


Common gotchas

  • Foreign keys without indexes. Postgres doesn’t auto-index them. JOIN performance suffers. Always add.

  • Missing index on the columns you join / filter most. The most common cause of mysterious slowness.

  • LIKE '%foo%' can’t use standard indexes. Use pg_trgm or full-text search for leading-wildcard patterns.

  • ORDER BY not matching index order. CREATE INDEX ... (created_at) doesn’t help ORDER BY created_at DESC directly — though Postgres can reverse-scan. Better: index in the order you query.

  • Composite index column order matters. INDEX (a, b) helps queries with WHERE a = ? but not WHERE b = ?. Plan column order based on common queries.

  • UNIQUE constraints already create indexes. Adding another is wasteful.

  • Indexing on every column “just in case.” Each index slows writes. Strategic > exhaustive.

  • OR clauses defeating indexes. WHERE a = ? OR b = ? may not use indexes well. UNION of two queries sometimes faster.

  • Type mismatches. WHERE id = '42' with id being integer → Postgres may not use the index efficiently. Match types.

  • Implicit type casts in joins. Joining bigint = text makes the index unusable. Match types.

  • Function call on indexed column. WHERE LOWER(email) = ? won’t use an index on email. Either index on the function (CREATE INDEX ... ON users(lower(email))) or normalize on insert.

  • Forgetting to ANALYZE after big changes. Statistics drive the planner. After bulk loads, run ANALYZE.

  • Indexes that bloat over time. Heavy updates can fragment indexes. REINDEX occasionally; or use Postgres 12+ REINDEX CONCURRENTLY.

  • N+1 ORM queries. Profile what your ORM generates. Eager-load relationships.

  • SELECT * with too many columns. Pulls big rows over the network. List columns.

  • Pagination with large OFFSET. LIMIT 10 OFFSET 100000 scans 100,010 rows. Use keyset pagination (WHERE id > last_id).

  • No LIMIT on listings. Returning unbounded rows = potential disaster as data grows.

  • Long transactions blocking VACUUM. Big batch jobs that hold transactions for hours prevent dead-tuple cleanup. Split into smaller chunks.

  • Locks during index creation. CREATE INDEX locks the table by default. CREATE INDEX CONCURRENTLY doesn’t, but takes longer.

  • Random UUID inserts fragmenting indexes. Non-sequential UUIDs scatter writes across the index, fragmenting it. UUIDv7 (time-ordered) avoids this; or use bigint IDs.

  • JSONB queries without GIN indexes. Slow on large tables.

  • Storing huge values (TOAST) inline. Postgres handles huge columns via TOAST (separate storage), but accessing them is slower. Consider object storage for large blobs.

  • Connection limit exhaustion. Without a pooler, each app server connection counts. Default 100 connections runs out fast.

  • Not measuring. “I think this is slow” is different from “EXPLAIN ANALYZE says X is taking 4 seconds.” Measure before optimizing.

  • Premature optimization. A query running 50ms isn’t worth tuning if it’s only run rarely. Profile your real bottlenecks.


See also

Sources