SQL — the language

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: A declarative language for talking to relational databases. SELECT, INSERT, UPDATE, DELETE plus joins, grouping, and constraints. ~50 years old. Looks dated; works beautifully.

In plain English

SQL (Structured Query Language, pronounced “sequel” or “S-Q-L” depending on who you ask) is the language you use to talk to relational databases. You write a SQL statement; the database parses it, figures out how to fulfill it, and returns a result.

SQL is declarative, not procedural — you describe what you want, not how to get it. “Give me all users who signed up after January and have at least 5 posts, sorted by most recent post.” The database’s query planner figures out the most efficient way.

Standard SQL has been around since the 1970s. It’s been reliably stable for decades. Postgres, MySQL, SQL Server, SQLite, Oracle all speak it with minor dialect differences. Learn it once; use everywhere.

You’ll rarely write raw SQL in modern webapps — Supabase SDK, Prisma, Drizzle generate it for you. But understanding SQL lets you debug what’s actually happening, read query plans, and reach for raw SQL when ORMs hit their limits.

This entry is a working tour. Not a complete reference — that’s Postgres docs — but the parts you’ll use 95% of the time.

Why it matters

Universal language for relational data. Every Postgres, MySQL, SQL Server runs SQL. Skills transfer across databases.
Powerful in surprising ways. Joins, window functions, CTEs, full-text search, JSON queries — all in plain SQL.
Often more readable than ORM equivalents once you know it.
Debugging requires it. When something’s slow or returning wrong data, you read the SQL the ORM generated. Understanding it is essential.
You’ll see it in every database tutorial, Stack Overflow answer, and docs page. Skipping it = skipping a huge chunk of the docs ecosystem.

The four basic operations: CRUD

Almost everything is built on these four.

SELECT — read data

-- Get everything from a table
SELECT * FROM users;
 
-- Specific columns
SELECT id, name, email FROM users;
 
-- With a filter
SELECT * FROM users WHERE active = true;
 
-- Multiple conditions
SELECT * FROM users WHERE active = true AND created_at > '2026-01-01';
 
-- Sorted
SELECT * FROM posts ORDER BY created_at DESC;
 
-- Limited
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;
 
-- Paginated
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 20;
 
-- Distinct values
SELECT DISTINCT country FROM users;
 
-- With aggregates
SELECT COUNT(*) FROM users;
SELECT COUNT(*), country FROM users GROUP BY country;

INSERT — add data

-- One row
INSERT INTO users (email, name) VALUES ('new@example.com', 'New User');
 
-- Multiple rows
INSERT INTO users (email, name) VALUES
  ('a@example.com', 'Alice'),
  ('b@example.com', 'Bob');
 
-- Return the inserted row (Postgres)
INSERT INTO users (email, name) VALUES ('c@example.com', 'Carol')
RETURNING id, created_at;
 
-- Insert or do nothing if conflict
INSERT INTO users (email, name) VALUES ('a@example.com', 'Alice 2')
ON CONFLICT (email) DO NOTHING;
 
-- Upsert (insert or update on conflict)
INSERT INTO users (email, name) VALUES ('a@example.com', 'Alice 2')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

UPDATE — modify data

-- Update specific column
UPDATE users SET name = 'George Barsom' WHERE id = '...';
 
-- Update multiple columns
UPDATE users SET name = 'George', active = true WHERE id = '...';
 
-- With expression
UPDATE posts SET view_count = view_count + 1 WHERE id = '...';
 
-- ALWAYS include WHERE unless you really mean to update every row
UPDATE users SET last_login = now() WHERE id = '...';  -- ✓
UPDATE users SET last_login = now();                    -- updates EVERYTHING — dangerous

DELETE — remove data

-- Delete specific rows
DELETE FROM posts WHERE published = false;
 
-- ALWAYS include WHERE
DELETE FROM users WHERE last_login < '2020-01-01';  -- ✓
DELETE FROM users;                                   -- deletes everything — DON'T

WHERE clauses — filtering

WHERE column = value           -- equals
WHERE column != value          -- not equals
WHERE column > value           -- greater than
WHERE column BETWEEN a AND b   -- range
WHERE column IN (1, 2, 3)      -- in a set
WHERE column NOT IN (...)      -- not in
WHERE column IS NULL           -- null check (NOT = NULL)
WHERE column IS NOT NULL
WHERE column LIKE 'prefix%'    -- starts with
WHERE column LIKE '%word%'     -- contains
WHERE column ILIKE 'PREFIX%'   -- case-insensitive (Postgres)
WHERE column ~ 'regex'         -- regex match (Postgres)
WHERE EXISTS (subquery)        -- exists in subquery
WHERE column = ANY (array)     -- match any element of array

Combine with AND, OR, NOT, parentheses for grouping.

JOIN — combining tables

The killer feature of SQL. Real data lives across multiple tables; joins combine them.

INNER JOIN (the default — only matching rows)

-- Posts with their author info
SELECT posts.title, users.name
FROM posts
JOIN users ON users.id = posts.author_id;

Returns one row for each post where the author exists.

LEFT JOIN (all from left, matches from right)

-- All users, with their post count (including users with 0 posts)
SELECT users.name, COUNT(posts.id) AS post_count
FROM users
LEFT JOIN posts ON posts.author_id = users.id
GROUP BY users.id;

A user with no posts shows post_count = 0.

Multiple joins

SELECT users.name AS author, posts.title, comments.body AS comment
FROM users
JOIN posts ON posts.author_id = users.id
JOIN comments ON comments.post_id = posts.id
WHERE users.id = '...';

Self-join (table joins itself)

-- Employees and their managers
SELECT e.name AS employee, m.name AS manager
FROM employees e
JOIN employees m ON m.id = e.manager_id;

GROUP BY + aggregates

Group rows together and compute summaries:

-- Count posts per user
SELECT author_id, COUNT(*) AS post_count
FROM posts
GROUP BY author_id;
 
-- Average rating per product
SELECT product_id, AVG(rating) AS avg_rating, COUNT(*) AS num_reviews
FROM reviews
GROUP BY product_id;
 
-- Most active users
SELECT author_id, COUNT(*) AS posts
FROM posts
GROUP BY author_id
HAVING COUNT(*) > 10              -- filter on aggregates
ORDER BY posts DESC
LIMIT 10;

Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX(), STRING_AGG() (Postgres), ARRAY_AGG().

WHERE filters rows before grouping; HAVING filters after grouping.

Subqueries

A query inside another query:

-- Users who have at least one post
SELECT * FROM users
WHERE id IN (SELECT DISTINCT author_id FROM posts);
 
-- Each user with their latest post's title
SELECT users.name,
  (SELECT title FROM posts
   WHERE author_id = users.id
   ORDER BY created_at DESC
   LIMIT 1) AS latest_post
FROM users;
 
-- Posts with above-average length
SELECT * FROM posts
WHERE LENGTH(body) > (SELECT AVG(LENGTH(body)) FROM posts);

CTEs — Common Table Expressions

A named subquery you can reference multiple times. Reads cleaner than nested subqueries.

WITH recent_users AS (
  SELECT * FROM users WHERE created_at > now() - interval '30 days'
)
SELECT name, email FROM recent_users WHERE active = true;

Recursive CTEs for tree/graph data:

WITH RECURSIVE descendants AS (
  SELECT id, name, parent_id FROM categories WHERE id = 'root'
  UNION ALL
  SELECT c.id, c.name, c.parent_id
  FROM categories c
  JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants;

Window functions

Computations across rows without grouping (advanced but powerful):

-- Rank each post by views
SELECT title, view_count,
  RANK() OVER (ORDER BY view_count DESC) AS rank
FROM posts;
 
-- Running total
SELECT created_at, amount,
  SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM transactions;
 
-- Most recent post per user
SELECT DISTINCT ON (author_id) author_id, title, created_at
FROM posts
ORDER BY author_id, created_at DESC;

Schema operations (DDL — Data Definition Language)

Creating, modifying, dropping things.

-- Create a table
CREATE TABLE posts (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  title text NOT NULL,
  author_id uuid REFERENCES users(id) ON DELETE CASCADE,
  published boolean DEFAULT false,
  created_at timestamptz DEFAULT now()
);
 
-- Add a column
ALTER TABLE posts ADD COLUMN view_count integer DEFAULT 0;
 
-- Rename
ALTER TABLE posts RENAME COLUMN body TO content;
 
-- Change type (sometimes requires USING clause)
ALTER TABLE posts ALTER COLUMN view_count TYPE bigint;
 
-- Drop column
ALTER TABLE posts DROP COLUMN view_count;
 
-- Add constraint
ALTER TABLE posts ADD CONSTRAINT title_length CHECK (LENGTH(title) > 0);
 
-- Create index
CREATE INDEX posts_author_idx ON posts(author_id);
 
-- Drop a table
DROP TABLE posts;

DDL is for schema; DML (Data Manipulation Language) is for data.

NULL — the special value

NULL means “unknown” or “missing.” It behaves surprisingly:

SELECT NULL = NULL;        -- returns NULL, not true
SELECT NULL IS NULL;       -- true
SELECT NULL + 1;           -- NULL
SELECT 'foo' || NULL;      -- NULL (concatenation)
SELECT COUNT(*);           -- counts all rows
SELECT COUNT(column);      -- counts non-null rows

The mental rule: NULL “infects” anything it touches with NULL. To check for it, use IS NULL or COALESCE(value, fallback).

SELECT COALESCE(name, 'Anonymous') FROM users;
-- If name is NULL, returns 'Anonymous'; otherwise returns name

Indexes — making queries fast

Indexes are separate data structures the database maintains to find rows quickly.

-- Create a basic index
CREATE INDEX users_email_idx ON users(email);
 
-- Composite index (multiple columns)
CREATE INDEX posts_author_published_idx ON posts(author_id, published);
 
-- Partial index (only matching rows)
CREATE INDEX posts_published_idx ON posts(published_at)
  WHERE published_at IS NOT NULL;
 
-- Unique constraint = unique index
CREATE UNIQUE INDEX users_email_unique ON users(email);

Index columns that you frequently filter or join on. See Indexes & performance.

Transactions

Group multiple statements as one atomic unit:

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'alice';
UPDATE accounts SET balance = balance + 100 WHERE id = 'bob';
COMMIT;
 
-- Or rollback if anything goes wrong
ROLLBACK;

If anything fails between BEGIN and COMMIT, the whole thing rolls back. See Transactions & ACID.

JSON in Postgres

Postgres has excellent JSON support:

-- Store JSON
INSERT INTO users (data) VALUES ('{"prefs": {"theme": "dark", "lang": "en"}}');
 
-- Access nested values
SELECT data->'prefs'->>'theme' FROM users;    -- text
SELECT data->'prefs'->>'theme' = 'dark' FROM users;
 
-- Filter on JSON
SELECT * FROM users WHERE data->'prefs'->>'theme' = 'dark';
 
-- Index JSON
CREATE INDEX users_theme_idx ON users((data->'prefs'->>'theme'));

-> returns JSON; ->> returns text. Combine as needed.

Common gotchas

SELECT * in production code. Brittle (breaks when columns change), wasteful. List columns.
NULL = NULL is NULL, not true. Use IS NULL / IS NOT NULL.
Forgetting WHERE on UPDATE/DELETE. Updates/deletes every row. Production-killing bug. Always include WHERE unless you really mean it.
String concatenation building SQL. SQL injection. Use parameterized queries. See SQL injection.
GROUP BY pitfalls. Selecting non-aggregated columns that aren’t in GROUP BY — error in strict SQL; surprising results in lenient ones.
COUNT(*) vs COUNT(column). COUNT(*) counts all rows; COUNT(column) counts non-null values.
LIKE without leading %. LIKE 'prefix%' can use an index. LIKE '%suffix' and LIKE '%middle%' usually can’t.
Implicit type coercion. WHERE id = '42' when id is integer — varies by DB. Be explicit.
OR conditions can be slow. Sometimes UNION is faster. Test.
SELECT * from joined tables. All columns from all tables — quickly bloated. List specific columns.
JOIN without ON. Cross join — every row × every row. Disaster.
Subquery vs JOIN performance. Sometimes equivalent, sometimes not. Profile.
Forgetting indexes. A query on a million rows without an index = full table scan = seconds. With index = milliseconds.
Over-indexing. Each index slows writes. Index thoughtfully.
Storing dates as strings. Sorts wrong, can’t do date math. Use timestamptz or date.
Floats for money. 0.1 + 0.2 = 0.30000000000000004. Use integer cents or DECIMAL.
Variable column types between databases. boolean works on Postgres; SQL Server uses BIT. Stay aware of dialect.
Ordering without LIMIT can be huge. ORDER BY on a billion-row table sorts the whole thing. LIMIT helps if the DB can stop early.
Cursors / pagination via OFFSET. OFFSET 1000000 scans 1M rows then discards them. Use keyset pagination (WHERE id > last_id) for large offsets.
Trusting query plans across DBs. Postgres plans differ from MySQL. Use EXPLAIN (or EXPLAIN ANALYZE) on your DB.
Not using transactions for multi-step operations. Money transfer in two UPDATE statements — without transaction, one might succeed and one fail.
Letting users craft SQL via “search” features. SQL injection waiting to happen. Use parameterized queries / safe DSL.

Sources

PostgreSQL docs — SQL language
SQLBolt — free interactive tutorial
Use The Index, Luke! — free book on database indexes
Mode SQL Tutorial
PostgreSQL Tutorial

Tech & AI, Explained

Explorer

sql-the-language