When to write tests (and when not to): a framework

Status: 🟩 COMPLETE Last updated: 2026-06-21 Plain-English tagline: Tests aren’t free, and not every line of code deserves one. This is a practical rule for when the test pays for itself and when “just ship and iterate” is the better call.


What this decides

For any given chunk of code, do you write automated tests for it? And if so, which kind?

The choices:

  • Don’t test — skip automated tests entirely
  • Manual test — try it in the browser; trust observation
  • Unit test — small, fast tests of pure functions or isolated logic
  • Integration test — tests that touch real subsystems (database, API)
  • End-to-end (E2E) test — automated browser drives the app

For background: Why test đźź©, Unit tests đźź©, Integration tests đźź©, End-to-end tests đźź©.


The short answer

For a solo project on a small codebase (like Bible Quest):

  • Manual test most things. Click around; verify it works.
  • Write a unit test when you encounter logic that’s worth a “would I remember this in 6 months” test — usually:
    • Pure functions with non-trivial behavior
    • Bugs you’ve already fixed once
    • Anything you’re tempted to call “tricky”
  • Don’t bother with E2E tests until something has broken in a way that justified writing one.

For a team / production app:

  • Unit tests for pure logic.
  • Integration tests for the boundaries (API endpoints, database queries with RLS).
  • E2E tests for the 2-3 most critical user flows (signup, primary feature, payment).
  • Manual exploratory testing for everything else.

The factors that matter

  1. What’s the cost of a bug shipping? Critical infrastructure → tests are non-optional. Hobby project → maybe nothing breaks.
  2. How often will this code change? Often-changed code benefits from tests (catches regressions). Write-once code rarely needs them.
  3. How testable is it? Pure functions are cheap to test. Functions that hit the database are 10x harder.
  4. Is the behavior obvious by reading the code? If so, tests duplicate the obvious. If the code is subtle, tests document the intent.
  5. Have you already fixed this bug once? If yes, write a test so you never fix it again.

When tests are clearly worth writing

  • Pure functions with non-trivial logic. Scoring algorithms, date math, parsing, similarity calculations. Easy to test; high-value.
  • Anything you’ve debugged for more than an hour. That investment of confusion shouldn’t have to repeat.
  • Code at security boundaries. Auth checks, permission logic, input validation. Bugs here are expensive.
  • Algorithms with edge cases. Off-by-one bugs, boundary conditions, empty/null/single-element inputs.
  • Library code you publish. Other people will depend on it; the test IS the contract.

Bible Quest examples worth testing:

  • lib/streak.ts::calculateStreak(dates, today) — pure function, important behavior, edge cases (gap in middle, today not yet read).
  • lib/memory.ts::similarity() — Levenshtein scoring; many edge cases.
  • lib/bible-ref.ts::parseReference() — 280+ aliases; impossible to memorize correctness manually.
  • lib/character-match.ts::buildDailyChallenge(date) — deterministic seeded RNG; should produce the same challenge for the same date.

When tests are NOT worth writing

  • Display-only React components. <Header /> rendering <h1> is not interesting.
  • Trivial getters/setters or wrappers around well-tested libraries.
  • Prototype code you’ll throw away in a week.
  • Throwaway scripts. Build a CSV, run once, done.
  • Anything where the test would just duplicate the code. “Test that this function returns X” when the function literally return X — meaningless.
  • Configuration files. tailwind.config.ts doesn’t need a test.
  • Code where you have no clear specification. If you don’t know what “correct” means, you can’t write a meaningful test.

When to pick UNIT TESTS

  • Pure functions — input → output, no side effects.
  • Logic isolated from frameworks (db, fetch, browser APIs).
  • Code you can refactor confidently because the test guards behavior.
  • Quick feedback loop — runs in milliseconds, on every save.

Tool: Vitest (modern Next.js default) or Jest.


When to pick INTEGRATION TESTS

  • Database queries you can’t mock cleanly — RLS policies, complex SQL, triggers.
  • API endpoints — verify the full request/response cycle.
  • Cross-system contracts — your code + Supabase + auth flow working together.
  • Migrations — verify the migration produces the expected schema state.

Tools: Vitest + Testcontainers, or Supabase’s local DB stack with supabase start.


When to pick END-TO-END (E2E) TESTS

  • Critical user flows that span the whole stack — signup, login, checkout, the one feature your product exists for.
  • Smoke tests in CI — verify the deployed app actually loads.
  • Regression tests for specific bugs — automated reproduction of “this thing broke once.”

Don’t try to cover every page or every edge case in E2E — they’re slow, flaky, and expensive to maintain. 5-10 well-chosen E2E tests is more valuable than 50 mediocre ones.

Tool: Playwright (modern default).


The testing pyramid (still useful in 2026)

        /\
       /E2E\         5-10 tests, slow, brittle
      /------\
     /  Int   \      maybe 20-50 tests, medium speed
    /----------\
   /   Unit     \    hundreds; fast; cheap
  /--------------\

Lots of unit tests at the base, a moderate layer of integration, a thin sliver of E2E at the top.

If your test suite is inverted (lots of E2E, no unit tests), refactor — the suite will be slow, flaky, and maintenance-hostile.


What if I’ve already chosen?

“I’ve written tests for the wrong things”: delete the low-value ones. Tests are liabilities as well as assets. A test that doesn’t catch real bugs but slows the suite is net negative.

“I have no tests and now things keep breaking”: start with regression tests. For each bug that’s bitten you twice, write a test BEFORE fixing it the second time. The test ensures it doesn’t bite a third time.

“My E2E suite is flaky”: flaky tests are worse than no tests (they erode trust). Diagnose: is it timing? Is it state contamination between tests? Is it a real bug surfacing intermittently? Fix or delete; don’t ignore.

“My team requires 100% coverage”: push back politely. Coverage is a proxy metric; high coverage with bad tests is meaningless. Aim for meaningful coverage of the parts that matter.


See also


Sources