Schema design
Status: đźź© COMPLETE Last updated: 2026-06-19 Plain-English tagline: The art of deciding what tables and columns your database should have. Get it right early; changes later are expensive. Most architecture decisions live here.
In plain English
Your database’s schema is the structure: which tables exist, what columns each has, what types those columns are, what constraints apply, what relationships connect them. Schema design is the upfront decision-making about all that.
It’s important because:
- Schema decisions ripple through everything. App code is shaped by schema. Performance is shaped by schema. Security (via RLS policies) is shaped by schema.
- Changes are expensive. Renaming a column on a million-row table can lock the table for minutes. Splitting a table after launch is a project. Better to think upfront.
- The right schema makes hard problems easy. A well-shaped schema makes complex queries simple. A poorly-shaped schema makes simple queries gnarly.
The good news: there are well-established patterns. Normalize for canonical data. Use foreign keys for relationships. Add indexes where queries need them. Add created_at / updated_at on everything. Use UUIDs for IDs in distributed setups. These cover most cases.
The bad news: every project has edge cases that don’t fit the patterns. Schema design rewards thinking carefully and iterating.
Why it matters
- The single highest-leverage architecture decision in most webapps.
- Cheap to get right at the start; expensive to fix later. Spending a day on schema at week 1 saves weeks later.
- Performance ceiling. A poorly-normalized schema or missing indexes caps how fast your app can ever go.
- Security ceiling. RLS works with schemas that have clear ownership semantics. Without that, security is harder.
The fundamentals
A table is a thing
Tables represent entities — concrete or conceptual things your app cares about. Users, posts, orders, products, sessions.
Name tables in plural (users, not user). Consistent convention; standard in most ecosystems.
A column is a property
Columns represent attributes of those entities. A user has an email, a name, a created_at.
Name columns in snake_case (first_name, not firstName). Standard SQL convention.
A row is a specific instance
Each row is one entity. One user. One post. Identified by a unique primary key.
A foreign key represents a relationship
A posts table with an author_id column referencing users(id) says: each post belongs to one user.
Choosing primary keys
A primary key uniquely identifies each row. Three common choices:
Auto-incrementing integer (SERIAL / BIGSERIAL)
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEYPros: small, fast to index, naturally ordered. Cons: reveals scale (“user #42” means there are at least 42 users), can’t be generated client-side, conflicts on merging databases.
Use for: internal-only IDs where scale isn’t sensitive.
UUID
id uuid PRIMARY KEY DEFAULT gen_random_uuid()Pros: non-sequential (no info leak), generated anywhere (client, multiple servers, no coordination), can’t collide in practice. Cons: 16 bytes (vs 4 or 8 for integers), slightly slower in indexes, harder to read in URLs.
Use for: anything user-facing, anything distributed, anything you might merge across systems.
Modern default for new webapps: UUID. Supabase uses UUIDs by default for auth.users and recommends them throughout.
Natural keys (email, slug)
email text PRIMARY KEYPros: meaningful, easy to debug. Cons: can change (users change emails!), can collide, limited types.
Use for: lookup tables with truly stable keys (country codes, currency codes). Rarely the right choice otherwise.
Normalization vs denormalization
Normalization
The classical practice: store each piece of information in one place. If a user’s name appears in 100 posts, store it once in users and reference it via a foreign key.
-- Normalized
users: id, name, email
posts: id, author_id (FK→users), title, bodyPros: updates touch one row, no inconsistency, smaller storage. Cons: every query needs a join, can be slower.
Denormalization
Intentionally duplicate data for read performance.
-- Denormalized
posts: id, author_id, author_name (copy), title, bodyPros: reads avoid joins. Cons: updates have to touch every duplicate, easy to get inconsistent.
The pragmatic answer
- Default to normalized. It’s simpler to maintain.
- Denormalize specifically when profiling shows you need it.
- Common denormalization patterns: caching counts (
comment_counton posts), caching computed values (average_rating). - Use triggers or app-level discipline to keep duplicates in sync.
For most webapps: normalize first; only denormalize if a profiled real problem demands it.
Common schema patterns
Every table gets a primary key + timestamps
CREATE TABLE anything (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now(),
-- domain columns...
);created_at is set once. updated_at updates on changes (via trigger). Both are useful for debugging, auditing, sorting.
Soft delete
Instead of actually deleting rows, mark them deleted:
CREATE TABLE posts (
id uuid PRIMARY KEY,
-- ...
deleted_at timestamptz
);
-- Query for active rows
SELECT * FROM posts WHERE deleted_at IS NULL;
-- "Delete" by setting the column
UPDATE posts SET deleted_at = now() WHERE id = '...';Pros: recoverable, preserves history, foreign keys still valid.
Cons: every query needs the WHERE deleted_at IS NULL; easy to forget.
In practice, soft delete via RLS or a view that filters it makes the discipline easier.
Polymorphic associations (use sparingly)
One table referring to “any of several types”:
-- Bad pattern
CREATE TABLE comments (
id uuid PRIMARY KEY,
commentable_type text, -- 'post' or 'video' or 'image'
commentable_id uuid -- references different tables based on type
);Can’t use foreign keys cleanly. Many query patterns get awkward. Prefer:
-- Better: separate comment tables OR a single content table
CREATE TABLE content (
id uuid PRIMARY KEY,
type text,
-- shared columns
);
CREATE TABLE comments (
id uuid PRIMARY KEY,
content_id uuid REFERENCES content(id),
-- ...
);Polymorphism in databases is usually a smell. Think hard before reaching for it.
Junction tables for many-to-many
A user can be in many teams; a team has many users:
CREATE TABLE team_members (
team_id uuid REFERENCES teams(id) ON DELETE CASCADE,
user_id uuid REFERENCES users(id) ON DELETE CASCADE,
role text,
joined_at timestamptz DEFAULT now(),
PRIMARY KEY (team_id, user_id)
);Composite primary key prevents duplicate memberships.
Status / lifecycle columns
For things with discrete states (orders, subscriptions):
CREATE TABLE orders (
-- ...
status text NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'delivered', 'cancelled'))
);CHECK constraint enforces valid values. Better than untyped strings.
For Postgres specifically, you can use ENUM types — cleaner but harder to add new values later.
Audit trail (when needed)
For tables where you need history of changes:
CREATE TABLE posts_history (
history_id uuid PRIMARY KEY,
post_id uuid,
changed_at timestamptz,
changed_by uuid,
old_data jsonb,
new_data jsonb
);Often filled by triggers. Heavy but necessary for some compliance / debugging needs.
Naming conventions
Pick a convention; stick to it. Some common patterns:
| Element | Convention | Example |
|---|---|---|
| Table | plural snake_case | users, blog_posts |
| Column | snake_case | first_name, created_at |
| Primary key | id | id |
| Foreign key | <thing>_id | user_id, post_id |
| Boolean | is_<x> or has_<x> | is_active, has_paid |
| Timestamp | <verb>_at | created_at, published_at |
| Junction table | <a>_<b> | team_members, tag_posts |
| Index | <table>_<col>_idx | users_email_idx |
Consistency matters more than exact convention. A schema where some tables use userId and others user_id is harder to read than one consistently using either.
Constraints — the database’s safety net
The database can enforce rules. Use it.
CREATE TABLE users (
id uuid PRIMARY KEY,
email text UNIQUE NOT NULL,
age integer CHECK (age >= 0 AND age <= 150),
role text NOT NULL CHECK (role IN ('user', 'admin')),
created_at timestamptz DEFAULT now()
);PRIMARY KEY— uniqueness + indexUNIQUE— no duplicatesNOT NULL— can’t be emptyCHECK— custom validationDEFAULT— value if not providedREFERENCES— foreign key
Defense in depth: validate at app level for good error messages AND at DB level so bad data can never sneak in.
A concrete example: a typical SaaS schema sketch
A multi-user todo app:
-- Users (managed by Supabase Auth, referenced by app data)
-- auth.users is built-in; we attach app-specific data:
CREATE TABLE profiles (
id uuid PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
display_name text,
avatar_url text,
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now()
);
CREATE TABLE workspaces (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
name text NOT NULL,
owner_id uuid REFERENCES auth.users(id) ON DELETE CASCADE,
created_at timestamptz DEFAULT now()
);
CREATE TABLE workspace_members (
workspace_id uuid REFERENCES workspaces(id) ON DELETE CASCADE,
user_id uuid REFERENCES auth.users(id) ON DELETE CASCADE,
role text NOT NULL CHECK (role IN ('owner', 'admin', 'member', 'viewer')),
joined_at timestamptz DEFAULT now(),
PRIMARY KEY (workspace_id, user_id)
);
CREATE TABLE projects (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id uuid REFERENCES workspaces(id) ON DELETE CASCADE,
name text NOT NULL,
description text,
created_by uuid REFERENCES auth.users(id),
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now()
);
CREATE TABLE tasks (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
project_id uuid REFERENCES projects(id) ON DELETE CASCADE,
title text NOT NULL,
description text,
status text NOT NULL DEFAULT 'todo' CHECK (status IN ('todo', 'in_progress', 'done', 'cancelled')),
assigned_to uuid REFERENCES auth.users(id) ON DELETE SET NULL,
due_at timestamptz,
created_by uuid REFERENCES auth.users(id),
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now()
);
-- Indexes for common queries
CREATE INDEX workspace_members_user_idx ON workspace_members(user_id);
CREATE INDEX projects_workspace_idx ON projects(workspace_id);
CREATE INDEX tasks_project_idx ON tasks(project_id);
CREATE INDEX tasks_assigned_idx ON tasks(assigned_to) WHERE assigned_to IS NOT NULL;
-- Enable RLS on all tables (DON'T skip)
ALTER TABLE profiles ENABLE ROW LEVEL SECURITY;
ALTER TABLE workspaces ENABLE ROW LEVEL SECURITY;
ALTER TABLE workspace_members ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;This sketch covers: users, multi-tenant workspaces, per-workspace permissions, projects under workspaces, tasks under projects, ownership tracking, status enums, timestamps. RLS would then define who can do what.
When the schema needs to change
It will. Use migrations (Migrations). Some changes are easier than others:
| Change | Difficulty |
|---|---|
| Add a nullable column | Easy |
| Add an index | Easy (Postgres has CONCURRENTLY for online) |
| Add a NOT NULL column with default | Easy in modern Postgres |
| Rename a column | Medium — may break app code that still uses old name |
| Change a column’s type | Medium-hard — may require rewriting all rows |
| Drop a column | Easy in DB; coordinate with app |
| Split a table | Hard — data migration + app changes |
| Merge tables | Hard |
| Change primary key type | Very hard |
Patterns for zero-downtime schema changes:
- Add new column / table
- Have app write to both old and new
- Backfill data from old to new
- Switch app to read from new
- Stop app writes to old
- Drop old
Each step is safe and reversible. Skipping steps causes outages.
Common gotchas
-
No primary key on a table. Some tools assume one exists. Always have one (even if synthetic).
-
Floating-point for money.
0.1 + 0.2 = 0.30000000000000004. Use integers (cents) ornumeric. -
Storing timestamps as strings. Sort wrong, can’t do date math. Use
timestamptz(timezone-aware). -
timestampwithout timezone. Defaults to whatever the connection’s TZ is. Subtle bugs across regions. Usetimestamptz. -
NULL columns where empty string suffices. Or vice versa. NULL means “unknown”; empty string means “known and empty.” Pick deliberately.
-
Missing NOT NULL. Lets garbage in. Be strict where appropriate.
-
No CHECK constraints on enum-like fields.
status textwith no constraint = anyone can insert any string. Enforce. -
Polymorphic associations. Hard to do well. Almost always a sign the schema needs rethinking.
-
EAV (entity-attribute-value). “We’ll store everything as key-value pairs so it’s flexible.” Performance nightmare. Use JSON columns if you really need flexibility.
-
Storing arrays when you should have a related table. Postgres has arrays, but
tags text[]makes searching for “posts tagged X” awkward. Junction table is usually better. -
Forgetting indexes. Queries on million-row tables without indexes = pain. Index columns you query/join on.
-
Indexing everything. Each index has write cost. Index thoughtfully.
-
Cascade deletes you didn’t think through.
ON DELETE CASCADEis powerful — and dangerous. Deleting a user might delete everything they ever touched. Sometimes right, sometimes catastrophic. -
No timestamps. Adding
created_at/updated_atlater is harder than including from the start. -
Snake_case vs camelCase inconsistency. Pick one. Be consistent.
-
Reserved words as column names.
user,order,type— sometimes valid, sometimes need quoting. Just avoid them. -
No documentation on what columns mean. “What does
flag_3indicate?” Future you will hate past you. Use comments:COMMENT ON COLUMN posts.is_pinned IS 'Pinned to top of feed'; -
Long column / table names. Postgres has a 63-char limit. Plan within it.
-
Allowing JSON for things that should be columns. Querying / indexing inside JSON is harder than first-class columns. Reach for JSON when truly variable.
-
Designing for “what if” features. “We might want to track X someday.” YAGNI. Design for what you know; iterate.
-
Schemas that don’t match how the app queries. Profile your actual queries. Indexes and structures should serve them.
-
Forgetting to plan for tenant isolation. Multi-tenant SaaS without
workspace_id/org_idon every relevant table = painful retrofit. -
Hardcoded roles in CHECK constraints. Adding a new role later requires migration. Sometimes worth a separate
rolestable. -
No “auto-updated
updated_at.” App code has to remember to update it. Use a trigger or it’ll get stale.
See also
- What is a database? đźź©
- SQL — the language 🟩 — the syntax you use to define schemas
- Postgres 🟩 🟦 — what the schema runs on
- Supabase 🟩 🟦
- Row-Level Security 🟩 — schema design + RLS go together
- Indexes & performance đźź©
- Joins & relationships đźź©
- Migrations 🟩 — how to evolve schemas safely
- Transactions & ACID đźź©
- SQL vs NoSQL đźź©
- SQL injection đźź©
- Glossary: Schema