What is a database?

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: A persistent, organized place for your app’s data. Survives restarts. Answers queries fast. The thing your app talks to whenever it needs to remember something.


In plain English

When your webapp needs to remember anything β€” users, posts, settings, scores, orders β€” that information has to live somewhere that survives the app being restarted. A regular variable in your code is gone when the server restarts. A file on disk would work but quickly gets clunky for anything substantial. You need a database.

A database is, at its core, a program that:

  1. Stores data persistently on disk
  2. Organizes the data in a defined structure
  3. Answers queries quickly (much faster than scanning a file every time)
  4. Handles concurrent access so multiple users don’t trample each other
  5. Maintains integrity so half-finished operations don’t leave data corrupt

Databases come in many shapes. The two main families:

  • Relational (SQL) β€” Data lives in tables of rows and columns, strictly typed, related to each other via shared keys. You query in SQL. Examples: Postgres, MySQL, SQLite, SQL Server.
  • Non-relational (NoSQL) β€” Looser structures: documents (JSON-like), key-value pairs, graphs, time-series. Examples: MongoDB, Redis, DynamoDB, Neo4j.

For ~99% of webapps in 2026, start with a relational database β€” specifically Postgres. The β€œNoSQL because scaling” arguments of the 2010s have mostly been settled in Postgres’s favor. Reach for NoSQL when you have a specific need NoSQL serves uniquely.

In your stack, that means Supabase β€” which is Postgres-as-a-service plus auth, storage, and APIs. See Supabase.


Why it matters

  • Every real webapp uses one. Skipping the database means your app can’t remember anything between requests.
  • The choice ripples through everything. Database choice affects what you can do, what’s hard, what’s fast.
  • Most bugs and outages trace here. Database performance, missing indexes, schema design mistakes β€” major source of headaches.
  • Knowing the basics demystifies a lot. Once β€œthe database has tables, you query with SQL, schemas have rules” lands, half of webapp architecture clicks.

What a database actually does (the mechanics)

When your app says β€œgive me all posts by user 42”:

  1. Your app forms a query β€” typically a SQL statement: SELECT * FROM posts WHERE author_id = 42
  2. Your app sends the query to the database server (over a network connection or, for embedded DBs like SQLite, directly to a file)
  3. The database parses the query
  4. The database plans how to fulfill it (use an index? scan the whole table? join in what order?)
  5. The database executes the plan, reading from disk and memory
  6. The database returns the result set β€” rows of data

This whole process typically takes 1-50 milliseconds for simple queries. For complex ones, much longer.

The database server is its own process, usually on its own machine (or container), with its own memory and disk. Your app talks to it over a network.


Tables, rows, columns (relational basics)

A table is a named collection of records, like a spreadsheet:

users:
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id  β”‚ name     β”‚ email                β”‚ created_at          β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ George   β”‚ george@example.com   β”‚ 2026-05-12 10:00:00 β”‚
β”‚ 2   β”‚ Sarah    β”‚ sarah@example.com    β”‚ 2026-06-01 14:32:00 β”‚
β”‚ 3   β”‚ Marcus   β”‚ marcus@example.com   β”‚ 2026-06-19 09:15:00 β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Rows are individual records (each user)
  • Columns are fields (id, name, email, etc.)
  • Each column has a type (id is integer, name is text, created_at is timestamp)
  • One column is usually the primary key β€” unique identifier (here, id)

Tables can reference each other via foreign keys β€” a column that holds the primary key of another table. This is how relationships work:

posts:
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id  β”‚ title         β”‚ body         β”‚ author_id β”‚  ← foreign key to users.id
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 100 β”‚ Hello world   β”‚ My first...  β”‚ 1         β”‚
β”‚ 101 β”‚ Tuesday       β”‚ Another...   β”‚ 1         β”‚
β”‚ 102 β”‚ Welcome       β”‚ Hi everyone  β”‚ 2         β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β€œFind all posts by George” β†’ look up George’s id (1), then SELECT * FROM posts WHERE author_id = 1.


What a schema is

The schema is the definition of your tables: which tables exist, what columns each has, what types those columns are, what constraints apply, what relationships exist.

A small schema in SQL:

CREATE TABLE users (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  email text NOT NULL UNIQUE,
  name text,
  created_at timestamptz DEFAULT now()
);
 
CREATE TABLE posts (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  title text NOT NULL,
  body text,
  author_id uuid REFERENCES users(id) ON DELETE CASCADE,
  published boolean DEFAULT false,
  created_at timestamptz DEFAULT now()
);
 
CREATE INDEX posts_author_idx ON posts(author_id);

This says:

  • Users have unique emails and optional names
  • Posts have an author who must exist in users
  • If a user is deleted, their posts are deleted too (CASCADE)
  • We’ve added an index for fast lookup of posts by author

Schema design is hard. Get it right early; changes later are expensive. See Schema design.


Queries (a tiny SQL tour)

-- Select: read data
SELECT * FROM users WHERE created_at > '2026-06-01';
 
-- Insert: add data
INSERT INTO users (email, name) VALUES ('new@example.com', 'New User');
 
-- Update: modify data
UPDATE users SET name = 'George Barsom' WHERE id = 1;
 
-- Delete: remove data
DELETE FROM posts WHERE published = false AND created_at < '2026-01-01';
 
-- Join: combine tables
SELECT posts.title, users.name AS author
FROM posts
JOIN users ON users.id = posts.author_id
WHERE posts.published = true
ORDER BY posts.created_at DESC
LIMIT 10;

That’s most of what you’ll write daily. See SQL β€” the language for more depth.

In modern webapps, you rarely write raw SQL β€” you use a library (Supabase SDK, Prisma, Drizzle) that generates it. But the SQL is still happening underneath; understanding it helps you debug.


ACID β€” what a real database promises

Relational databases are designed for ACID properties:

  • Atomicity β€” operations succeed completely or fail completely. No half-finished state.
  • Consistency β€” data always meets defined rules (constraints, foreign keys, types).
  • Isolation β€” concurrent operations don’t see each other’s intermediate state.
  • Durability β€” committed data survives crashes and power failures.

These aren’t free β€” they require careful engineering inside the database. But they’re what let you trust the database with money, identity, healthcare.

See Transactions & ACID.


What a database is NOT

  • Not the same as a backend. A backend is your server code; the database is one thing the backend talks to.
  • Not magical scaling. A poorly-designed schema or missing indexes can grind to a halt under load.
  • Not a free file storage. Databases store small structured records well; binary blobs (images, videos) better in object storage (S3, Supabase Storage, R2).
  • Not a search engine. Databases can do basic text search; for serious search, use Elasticsearch / Meilisearch / Typesense.
  • Not a cache. Hitting the DB for every read is fine for most apps; for very high traffic, you add caching layers (Redis, in-memory).
  • Not analytics. Transactional databases (Postgres, MySQL) handle β€œwhat’s George’s profile” well. For β€œhow many users signed up last month grouped by source” at scale, analytical databases (BigQuery, Snowflake, DuckDB) are better.

A concrete example: building a tiny blog database

-- Schema
CREATE TABLE users (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  email text UNIQUE NOT NULL,
  name text,
  created_at timestamptz DEFAULT now()
);
 
CREATE TABLE posts (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  author_id uuid REFERENCES users(id) ON DELETE CASCADE,
  title text NOT NULL,
  body text,
  published_at timestamptz,
  created_at timestamptz DEFAULT now()
);
 
CREATE TABLE comments (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  post_id uuid REFERENCES posts(id) ON DELETE CASCADE,
  author_id uuid REFERENCES users(id) ON DELETE SET NULL,
  body text NOT NULL,
  created_at timestamptz DEFAULT now()
);
 
-- Indexes for common queries
CREATE INDEX posts_author_idx ON posts(author_id);
CREATE INDEX posts_published_idx ON posts(published_at DESC) WHERE published_at IS NOT NULL;
CREATE INDEX comments_post_idx ON comments(post_id);

A few example queries:

-- All published posts, newest first
SELECT * FROM posts WHERE published_at IS NOT NULL ORDER BY published_at DESC;
 
-- A post with its author
SELECT posts.*, users.name AS author_name
FROM posts JOIN users ON users.id = posts.author_id
WHERE posts.id = '...';
 
-- Comment count per post
SELECT post_id, COUNT(*) AS comment_count
FROM comments GROUP BY post_id;

That’s a functioning blog schema in 30 lines. Add Supabase, and you have it as a service with auth and APIs.


Common gotchas

  • No primary key. Every table should have one. Postgres can add a synthetic id if you don’t pick. Don’t skip.

  • SELECT * in production code. Wasteful (pulls all columns), brittle (breaks if columns change). List the columns you need.

  • Missing indexes. A query that scans the whole table works for 100 rows; dies at 1M rows. Index columns you query/join on.

  • Over-indexing. Each index speeds reads but slows writes. Indexes also take space. Index thoughtfully, not exhaustively.

  • String IDs vs integers vs UUIDs. Each has trade-offs. UUIDs (Supabase default) are non-sequential (good for distributed) but slightly larger and slower in indexes. Integers are smaller and faster but predictable (info disclosure risk). Pick once; consistency matters.

  • Storing structured data as JSON when it could be columns. Postgres has JSON, but for data you query on regularly, columns are faster.

  • Not using foreign keys. Without them, you can have orphaned records (posts whose author was deleted). Use them, configure ON DELETE behavior.

  • Storing files in DB. Large binaries belong in object storage. Store URL references in DB.

  • N+1 queries. Fetching a list, then making a separate query for each item. 100 items = 101 queries. Use joins or ORM features to fetch in one go.

  • Long-running transactions. Holding locks too long blocks other operations. Keep transactions short.

  • No backups, or backups never tested. Backup that’s never restored isn’t a backup. Periodically restore to verify.

  • No connection pooling. Opening a fresh DB connection per request is slow and exhausts the DB’s connection limit. Use a pooler (PgBouncer, Supabase’s built-in pooler).

  • Putting business logic in stored procedures. Sometimes right; usually painful to maintain compared to app code. Use sparingly.

  • Not normalizing. Storing the same data in multiple places means updates have to touch all copies. Normalize (split into tables) for canonical data; consider denormalization only for performance.

  • Normalizing too much. Strict normalization can require many joins for common queries. Some denormalization is OK if it pays off.

  • Storing prices as floats. 0.1 + 0.2 = 0.30000000000000004. Use integers (cents) or DECIMAL.

  • Storing timestamps without timezone. Future you (in a different time zone) will hate past you. Use timestamptz (Postgres) or always store UTC.

  • No constraints. A text column for status that can be β€œpending”, β€œactive”, β€œcancelled” β€” but nothing stops β€œPending” or β€œactve” from being inserted. Use CHECK constraints or enum types.

  • Allowing direct DB access from frontend. Skip RLS at your peril. Either use RLS (Row-Level Security) or route everything through your backend.


See also

Sources