Schema design

Status: đźź© COMPLETE Last updated: 2026-06-19 Plain-English tagline: The art of deciding what tables and columns your database should have. Get it right early; changes later are expensive. Most architecture decisions live here.


In plain English

Your database’s schema is the structure: which tables exist, what columns each has, what types those columns are, what constraints apply, what relationships connect them. Schema design is the upfront decision-making about all that.

It’s important because:

  • Schema decisions ripple through everything. App code is shaped by schema. Performance is shaped by schema. Security (via RLS policies) is shaped by schema.
  • Changes are expensive. Renaming a column on a million-row table can lock the table for minutes. Splitting a table after launch is a project. Better to think upfront.
  • The right schema makes hard problems easy. A well-shaped schema makes complex queries simple. A poorly-shaped schema makes simple queries gnarly.

The good news: there are well-established patterns. Normalize for canonical data. Use foreign keys for relationships. Add indexes where queries need them. Add created_at / updated_at on everything. Use UUIDs for IDs in distributed setups. These cover most cases.

The bad news: every project has edge cases that don’t fit the patterns. Schema design rewards thinking carefully and iterating.


Why it matters

  • The single highest-leverage architecture decision in most webapps.
  • Cheap to get right at the start; expensive to fix later. Spending a day on schema at week 1 saves weeks later.
  • Performance ceiling. A poorly-normalized schema or missing indexes caps how fast your app can ever go.
  • Security ceiling. RLS works with schemas that have clear ownership semantics. Without that, security is harder.

The fundamentals

A table is a thing

Tables represent entities — concrete or conceptual things your app cares about. Users, posts, orders, products, sessions.

Name tables in plural (users, not user). Consistent convention; standard in most ecosystems.

A column is a property

Columns represent attributes of those entities. A user has an email, a name, a created_at.

Name columns in snake_case (first_name, not firstName). Standard SQL convention.

A row is a specific instance

Each row is one entity. One user. One post. Identified by a unique primary key.

A foreign key represents a relationship

A posts table with an author_id column referencing users(id) says: each post belongs to one user.


Choosing primary keys

A primary key uniquely identifies each row. Three common choices:

Auto-incrementing integer (SERIAL / BIGSERIAL)

id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY

Pros: small, fast to index, naturally ordered. Cons: reveals scale (“user #42” means there are at least 42 users), can’t be generated client-side, conflicts on merging databases.

Use for: internal-only IDs where scale isn’t sensitive.

UUID

id uuid PRIMARY KEY DEFAULT gen_random_uuid()

Pros: non-sequential (no info leak), generated anywhere (client, multiple servers, no coordination), can’t collide in practice. Cons: 16 bytes (vs 4 or 8 for integers), slightly slower in indexes, harder to read in URLs.

Use for: anything user-facing, anything distributed, anything you might merge across systems.

Modern default for new webapps: UUID. Supabase uses UUIDs by default for auth.users and recommends them throughout.

Natural keys (email, slug)

email text PRIMARY KEY

Pros: meaningful, easy to debug. Cons: can change (users change emails!), can collide, limited types.

Use for: lookup tables with truly stable keys (country codes, currency codes). Rarely the right choice otherwise.


Normalization vs denormalization

Normalization

The classical practice: store each piece of information in one place. If a user’s name appears in 100 posts, store it once in users and reference it via a foreign key.

-- Normalized
users:  id, name, email
posts:  id, author_id (FK→users), title, body

Pros: updates touch one row, no inconsistency, smaller storage. Cons: every query needs a join, can be slower.

Denormalization

Intentionally duplicate data for read performance.

-- Denormalized
posts:  id, author_id, author_name (copy), title, body

Pros: reads avoid joins. Cons: updates have to touch every duplicate, easy to get inconsistent.

The pragmatic answer

  • Default to normalized. It’s simpler to maintain.
  • Denormalize specifically when profiling shows you need it.
  • Common denormalization patterns: caching counts (comment_count on posts), caching computed values (average_rating).
  • Use triggers or app-level discipline to keep duplicates in sync.

For most webapps: normalize first; only denormalize if a profiled real problem demands it.


Common schema patterns

Every table gets a primary key + timestamps

CREATE TABLE anything (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  created_at timestamptz DEFAULT now(),
  updated_at timestamptz DEFAULT now(),
  -- domain columns...
);

created_at is set once. updated_at updates on changes (via trigger). Both are useful for debugging, auditing, sorting.

Soft delete

Instead of actually deleting rows, mark them deleted:

CREATE TABLE posts (
  id uuid PRIMARY KEY,
  -- ...
  deleted_at timestamptz
);
 
-- Query for active rows
SELECT * FROM posts WHERE deleted_at IS NULL;
 
-- "Delete" by setting the column
UPDATE posts SET deleted_at = now() WHERE id = '...';

Pros: recoverable, preserves history, foreign keys still valid. Cons: every query needs the WHERE deleted_at IS NULL; easy to forget.

In practice, soft delete via RLS or a view that filters it makes the discipline easier.

Polymorphic associations (use sparingly)

One table referring to “any of several types”:

-- Bad pattern
CREATE TABLE comments (
  id uuid PRIMARY KEY,
  commentable_type text,  -- 'post' or 'video' or 'image'
  commentable_id uuid     -- references different tables based on type
);

Can’t use foreign keys cleanly. Many query patterns get awkward. Prefer:

-- Better: separate comment tables OR a single content table
CREATE TABLE content (
  id uuid PRIMARY KEY,
  type text,
  -- shared columns
);
CREATE TABLE comments (
  id uuid PRIMARY KEY,
  content_id uuid REFERENCES content(id),
  -- ...
);

Polymorphism in databases is usually a smell. Think hard before reaching for it.

Junction tables for many-to-many

A user can be in many teams; a team has many users:

CREATE TABLE team_members (
  team_id uuid REFERENCES teams(id) ON DELETE CASCADE,
  user_id uuid REFERENCES users(id) ON DELETE CASCADE,
  role text,
  joined_at timestamptz DEFAULT now(),
  PRIMARY KEY (team_id, user_id)
);

Composite primary key prevents duplicate memberships.

Status / lifecycle columns

For things with discrete states (orders, subscriptions):

CREATE TABLE orders (
  -- ...
  status text NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'delivered', 'cancelled'))
);

CHECK constraint enforces valid values. Better than untyped strings.

For Postgres specifically, you can use ENUM types — cleaner but harder to add new values later.

Audit trail (when needed)

For tables where you need history of changes:

CREATE TABLE posts_history (
  history_id uuid PRIMARY KEY,
  post_id uuid,
  changed_at timestamptz,
  changed_by uuid,
  old_data jsonb,
  new_data jsonb
);

Often filled by triggers. Heavy but necessary for some compliance / debugging needs.


Naming conventions

Pick a convention; stick to it. Some common patterns:

ElementConventionExample
Tableplural snake_caseusers, blog_posts
Columnsnake_casefirst_name, created_at
Primary keyidid
Foreign key<thing>_iduser_id, post_id
Booleanis_<x> or has_<x>is_active, has_paid
Timestamp<verb>_atcreated_at, published_at
Junction table<a>_<b>team_members, tag_posts
Index<table>_<col>_idxusers_email_idx

Consistency matters more than exact convention. A schema where some tables use userId and others user_id is harder to read than one consistently using either.


Constraints — the database’s safety net

The database can enforce rules. Use it.

CREATE TABLE users (
  id uuid PRIMARY KEY,
  email text UNIQUE NOT NULL,
  age integer CHECK (age >= 0 AND age <= 150),
  role text NOT NULL CHECK (role IN ('user', 'admin')),
  created_at timestamptz DEFAULT now()
);
  • PRIMARY KEY — uniqueness + index
  • UNIQUE — no duplicates
  • NOT NULL — can’t be empty
  • CHECK — custom validation
  • DEFAULT — value if not provided
  • REFERENCES — foreign key

Defense in depth: validate at app level for good error messages AND at DB level so bad data can never sneak in.


A concrete example: a typical SaaS schema sketch

A multi-user todo app:

-- Users (managed by Supabase Auth, referenced by app data)
-- auth.users is built-in; we attach app-specific data:
 
CREATE TABLE profiles (
  id uuid PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
  display_name text,
  avatar_url text,
  created_at timestamptz DEFAULT now(),
  updated_at timestamptz DEFAULT now()
);
 
CREATE TABLE workspaces (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  name text NOT NULL,
  owner_id uuid REFERENCES auth.users(id) ON DELETE CASCADE,
  created_at timestamptz DEFAULT now()
);
 
CREATE TABLE workspace_members (
  workspace_id uuid REFERENCES workspaces(id) ON DELETE CASCADE,
  user_id uuid REFERENCES auth.users(id) ON DELETE CASCADE,
  role text NOT NULL CHECK (role IN ('owner', 'admin', 'member', 'viewer')),
  joined_at timestamptz DEFAULT now(),
  PRIMARY KEY (workspace_id, user_id)
);
 
CREATE TABLE projects (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id uuid REFERENCES workspaces(id) ON DELETE CASCADE,
  name text NOT NULL,
  description text,
  created_by uuid REFERENCES auth.users(id),
  created_at timestamptz DEFAULT now(),
  updated_at timestamptz DEFAULT now()
);
 
CREATE TABLE tasks (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  project_id uuid REFERENCES projects(id) ON DELETE CASCADE,
  title text NOT NULL,
  description text,
  status text NOT NULL DEFAULT 'todo' CHECK (status IN ('todo', 'in_progress', 'done', 'cancelled')),
  assigned_to uuid REFERENCES auth.users(id) ON DELETE SET NULL,
  due_at timestamptz,
  created_by uuid REFERENCES auth.users(id),
  created_at timestamptz DEFAULT now(),
  updated_at timestamptz DEFAULT now()
);
 
-- Indexes for common queries
CREATE INDEX workspace_members_user_idx ON workspace_members(user_id);
CREATE INDEX projects_workspace_idx ON projects(workspace_id);
CREATE INDEX tasks_project_idx ON tasks(project_id);
CREATE INDEX tasks_assigned_idx ON tasks(assigned_to) WHERE assigned_to IS NOT NULL;
 
-- Enable RLS on all tables (DON'T skip)
ALTER TABLE profiles ENABLE ROW LEVEL SECURITY;
ALTER TABLE workspaces ENABLE ROW LEVEL SECURITY;
ALTER TABLE workspace_members ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;

This sketch covers: users, multi-tenant workspaces, per-workspace permissions, projects under workspaces, tasks under projects, ownership tracking, status enums, timestamps. RLS would then define who can do what.


When the schema needs to change

It will. Use migrations (Migrations). Some changes are easier than others:

ChangeDifficulty
Add a nullable columnEasy
Add an indexEasy (Postgres has CONCURRENTLY for online)
Add a NOT NULL column with defaultEasy in modern Postgres
Rename a columnMedium — may break app code that still uses old name
Change a column’s typeMedium-hard — may require rewriting all rows
Drop a columnEasy in DB; coordinate with app
Split a tableHard — data migration + app changes
Merge tablesHard
Change primary key typeVery hard

Patterns for zero-downtime schema changes:

  1. Add new column / table
  2. Have app write to both old and new
  3. Backfill data from old to new
  4. Switch app to read from new
  5. Stop app writes to old
  6. Drop old

Each step is safe and reversible. Skipping steps causes outages.


Common gotchas

  • No primary key on a table. Some tools assume one exists. Always have one (even if synthetic).

  • Floating-point for money. 0.1 + 0.2 = 0.30000000000000004. Use integers (cents) or numeric.

  • Storing timestamps as strings. Sort wrong, can’t do date math. Use timestamptz (timezone-aware).

  • timestamp without timezone. Defaults to whatever the connection’s TZ is. Subtle bugs across regions. Use timestamptz.

  • NULL columns where empty string suffices. Or vice versa. NULL means “unknown”; empty string means “known and empty.” Pick deliberately.

  • Missing NOT NULL. Lets garbage in. Be strict where appropriate.

  • No CHECK constraints on enum-like fields. status text with no constraint = anyone can insert any string. Enforce.

  • Polymorphic associations. Hard to do well. Almost always a sign the schema needs rethinking.

  • EAV (entity-attribute-value). “We’ll store everything as key-value pairs so it’s flexible.” Performance nightmare. Use JSON columns if you really need flexibility.

  • Storing arrays when you should have a related table. Postgres has arrays, but tags text[] makes searching for “posts tagged X” awkward. Junction table is usually better.

  • Forgetting indexes. Queries on million-row tables without indexes = pain. Index columns you query/join on.

  • Indexing everything. Each index has write cost. Index thoughtfully.

  • Cascade deletes you didn’t think through. ON DELETE CASCADE is powerful — and dangerous. Deleting a user might delete everything they ever touched. Sometimes right, sometimes catastrophic.

  • No timestamps. Adding created_at / updated_at later is harder than including from the start.

  • Snake_case vs camelCase inconsistency. Pick one. Be consistent.

  • Reserved words as column names. user, order, type — sometimes valid, sometimes need quoting. Just avoid them.

  • No documentation on what columns mean. “What does flag_3 indicate?” Future you will hate past you. Use comments: COMMENT ON COLUMN posts.is_pinned IS 'Pinned to top of feed';

  • Long column / table names. Postgres has a 63-char limit. Plan within it.

  • Allowing JSON for things that should be columns. Querying / indexing inside JSON is harder than first-class columns. Reach for JSON when truly variable.

  • Designing for “what if” features. “We might want to track X someday.” YAGNI. Design for what you know; iterate.

  • Schemas that don’t match how the app queries. Profile your actual queries. Indexes and structures should serve them.

  • Forgetting to plan for tenant isolation. Multi-tenant SaaS without workspace_id / org_id on every relevant table = painful retrofit.

  • Hardcoded roles in CHECK constraints. Adding a new role later requires migration. Sometimes worth a separate roles table.

  • No “auto-updated updated_at.” App code has to remember to update it. Use a trigger or it’ll get stale.


See also

Sources