End-to-end (E2E) tests

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: Tests that drive a real browser the way a real user would — click buttons, fill forms, watch the screen change — exercising the entire deployed system as one black box.

In plain English

A unit test checks one function. An integration test checks a slice. An end-to-end test checks the entire user-facing flow, by literally automating a browser.

Concrete example: an E2E test for a signup flow might:

Open Chrome
Navigate to https://your-app.com/signup
Type “george@example.com” into the email field
Type “password123” into the password field
Click “Sign up”
Wait for the page to redirect to /dashboard
Assert that “Welcome, George” appears on the page

That’s it. The test doesn’t know or care HOW the signup works — what database is used, what API routes exist, what server actions get called. It just verifies “as a real user, I can sign up.”

If anything in the entire stack — frontend, backend, database, deploy config — is broken, the E2E test fails. That’s its power: it tests reality.

It’s also its cost: E2E tests are slow (5-60 seconds each), brittle (they fail when the UI changes, when networks are slow, when timing varies), and expensive to write and maintain. The pyramid principle says: have just enough of them, focused on the user paths that absolutely must work.

In 2026, the dominant E2E framework for webapps is Playwright. Cypress is the long-running second. Both drive real browsers via standards-compliant APIs.

Why it matters

Three reasons E2E tests are worth the investment, despite their cost:

They catch what nothing else does. A wiring bug between frontend, backend, and database that no unit or integration test notices. A broken deploy. A misconfigured env var in production. E2E tests are the reality check.
They cover critical user paths. Signup, login, payment, the core feature of your app — these MUST work, every deploy. Manual testing them every time is slow and unreliable. E2E tests automate the critical-path check.
They give confidence to deploy. A green E2E suite on the preview deploy is the strongest signal you have that the change is safe to promote to production. For solo developers shipping fast, this matters enormously.

The trade-off: E2E tests are real engineering. Setting up the environment, dealing with flakiness, maintaining selectors as the UI evolves — none of it is free. Treat E2E tests as a thin, targeted layer, not a comprehensive one.

The two big tools — Playwright vs Cypress

Playwright (Microsoft, 2020)

Modern, multi-browser (Chromium, Firefox, WebKit/Safari), multi-language (JS/TS/Python/.NET/Java), parallel by default.

import { test, expect } from "@playwright/test";
 
test("user can sign up", async ({ page }) => {
  await page.goto("/signup");
  await page.getByLabel("Email").fill("george@example.com");
  await page.getByLabel("Password").fill("password123");
  await page.getByRole("button", { name: "Sign up" }).click();
  await expect(page).toHaveURL("/dashboard");
  await expect(page.getByText("Welcome, George")).toBeVisible();
});

Cypress (2014, dominant before Playwright)

JavaScript-only, runs IN the browser (different architecture), excellent dev experience (interactive runner, time-travel debugging), single-tab only by default.

describe("signup", () => {
  it("user can sign up", () => {
    cy.visit("/signup");
    cy.findByLabelText("Email").type("george@example.com");
    cy.findByLabelText("Password").type("password123");
    cy.findByRole("button", { name: "Sign up" }).click();
    cy.url().should("include", "/dashboard");
    cy.findByText("Welcome, George").should("exist");
  });
});

Aspect	Playwright	Cypress
Browsers	Chromium, Firefox, WebKit	Chromium (incl. Chrome, Edge), Firefox
Languages	JS/TS/Python/.NET/Java	JS/TS only
Parallelism	Built-in	Paid (free tier serial)
Multi-tab / multi-window	Yes	No (Cypress runs IN one tab)
iFrame support	Native	Limited
Network mocking	Strong	Strong
Developer experience	Modern, fast	Excellent interactive runner
Maintained by	Microsoft	Cypress.io (company)
Community trajectory in 2026	Growing	Stable

For new projects in 2026: Playwright is the default. Cypress is fine for legacy projects and has its devoted users, but Playwright has more momentum.

For the rest of this entry I’ll use Playwright examples.

A concrete example: a critical-path E2E test

For a Bible Quest-style project, a meaningful E2E test:

// e2e/login-and-progress.spec.ts
import { test, expect } from "@playwright/test";
 
test.describe("Logged-in user can complete a question", () => {
  test.beforeEach(async ({ page }) => {
    // Log in via a test-only seed account
    await page.goto("/login");
    await page.getByLabel("Email").fill(process.env.E2E_TEST_EMAIL!);
    await page.getByLabel("Password").fill(process.env.E2E_TEST_PASSWORD!);
    await page.getByRole("button", { name: "Log in" }).click();
    await page.waitForURL("/dashboard");
  });
 
  test("answering a Bible question increments progress", async ({ page }) => {
    // Navigate to a specific lesson
    await page.goto("/bible/genesis/1");
 
    // Note progress before
    const progressBefore = await page.getByLabel("Progress").textContent();
 
    // Answer the first question
    await page.getByRole("radio", { name: /In the beginning/i }).check();
    await page.getByRole("button", { name: "Submit" }).click();
 
    // Wait for the success indicator
    await expect(page.getByText("Correct!")).toBeVisible();
 
    // Verify progress incremented
    const progressAfter = await page.getByLabel("Progress").textContent();
    expect(progressAfter).not.toBe(progressBefore);
  });
});

What this exercises:

Login (auth route, session cookies, redirect)
Navigation to a deep page
A real radio button + form submission
A server action / API call that updates progress in the database
The UI’s reactive update after the action

If anything in that whole chain is broken, this test fails. One test covers an enormous amount of real behavior.

How to run E2E tests against a real environment

Three patterns:

1. Against a locally-running dev server

# Terminal 1
npm run dev
 
# Terminal 2
npx playwright test

Fast iteration, easy to debug. But your local environment may differ from production.

2. Against a Vercel preview deployment (the gold pattern)

Every PR gets a preview URL. Run E2E tests against that URL:

# .github/workflows/e2e.yml
on:
  pull_request:
 
jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx playwright install --with-deps
      - name: Wait for Vercel preview
        run: # ... wait for the PR's preview URL to be ready
      - name: Run E2E tests
        env:
          BASE_URL: ${{ env.VERCEL_PREVIEW_URL }}
        run: npx playwright test

This tests the REAL production build (with real env vars, real Vercel runtime, real Supabase). The closest you get to “what users will experience” without testing against production itself.

3. Against a dedicated staging environment

A persistent staging deploy that mirrors production. Pre-warmed, with test data already seeded. Faster than spinning up a preview, more controlled. Common in larger teams.

For solo developers: Vercel preview deployments + Playwright + GitHub Actions is the right setup. Costs nothing extra. Provides a real reality-check on every PR.

Locating elements — the most important skill

Most E2E test brittleness comes from how you locate elements. Three approaches, in order of preference:

1. Accessible queries (PREFERRED)

Query elements by what a user sees and how assistive tech sees them:

page.getByRole("button", { name: "Submit" })
page.getByLabel("Email")
page.getByText("Welcome")
page.getByAltText("Profile picture")
page.getByPlaceholder("Search")

These survive UI refactors. They don’t care about CSS classes, HTML structure, or component naming.

2. Test IDs (when accessibility queries don’t work)

// In the component
<div data-testid="user-menu">…</div>
 
// In the test
page.getByTestId("user-menu")

Stable across refactors, but adds clutter to your code and doesn’t verify accessibility.

3. CSS selectors / XPath (LAST RESORT)

page.locator(".user-menu > .dropdown")
page.locator("xpath=//div[@class='menu']")

Maximally fragile — every CSS change breaks tests. Use only when nothing else works.

The discipline: write your UI to be accessible by default, and then E2E tests get reliable selectors for free. A button with aria-label and role="button" is testable AND accessible. Win-win.

Waiting — the #1 source of flaky tests

The hardest problem in E2E testing: deciding WHEN to assert. Modern web apps load asynchronously. A test that runs faster than the page can respond fails. Two patterns:

DON’T: sleep / hardcoded waits

await page.click("button");
await page.waitForTimeout(2000);   // ❌ flaky on slow CI; wasteful on fast CI
await expect(page.getByText("Done")).toBeVisible();

This is the canonical flaky test antipattern.

DO: wait for explicit conditions

await page.click("button");
await expect(page.getByText("Done")).toBeVisible();   // ✅ auto-waits up to default timeout

Playwright’s assertions auto-wait. expect(locator).toBeVisible() polls for up to 5 seconds (default) for the condition. If it becomes true, the test passes; if not, fails with a clear message.

Other useful explicit waits:

await page.waitForURL("/dashboard");
await page.waitForResponse(resp => resp.url().includes("/api/posts"));
await page.waitForLoadState("networkidle");

NEVER waitForTimeout. It’s always wrong.

Test data — how to set up state without slow setup

A real challenge: an E2E test needs the app to be in a known state. Three patterns:

1. UI-driven setup (slow but realistic)

The test signs up a user, creates a post, navigates, and asserts. Mirrors real user behavior but each setup step adds seconds.

2. API-driven setup (faster)

The test calls API endpoints directly to seed data, then drives the UI to verify the flow:

test.beforeEach(async ({ request }) => {
  await request.post("/api/test/seed", {
    data: { user: "test-user", posts: ["one", "two", "three"] },
  });
});

You expose a “test-only” seed endpoint, guarded by an env-var flag.

3. Direct database seeding (fastest)

Before the test, write directly to the test database:

test.beforeEach(async () => {
  await db.users.create({ data: { email: "test@example.com" } });
});

Skips even the API. Fastest setup; least realistic.

For solo projects, mix 2 and 3 — use direct seeding for setup, drive the UI for the actual test.

Common Playwright features worth knowing

// Multiple browsers
test.use({ browserName: "firefox" });
 
// Mobile viewport
test.use({ ...devices["iPhone 13"] });
 
// Authenticate once, reuse across tests (huge speed win)
test.use({ storageState: "auth.json" });
 
// Screenshot on failure (automatic)
// Recording video on failure (automatic if configured)
 
// Trace files for debugging
// playwright.config.ts:
trace: "retain-on-failure"
 
// Run tests in parallel
test.describe.configure({ mode: "parallel" });
 
// API testing without a browser
test("API endpoint works", async ({ request }) => {
  const res = await request.get("/api/posts");
  expect(res.status()).toBe(200);
});

The Playwright trace viewer (npx playwright show-trace trace.zip) is one of the best debugging experiences in any test framework — every action, every assertion, every screenshot, replayable.

What to E2E test (and what not to)

Test these end-to-end:

Critical user journeys (signup, login, key feature, payment)
High-value paths that span auth + DB + UI in one flow
Recently-broken paths — once a path has broken in production, an E2E test for it pays dividends forever
Cross-page flows that can’t be captured in a single integration test

Don’t E2E test these:

Individual form validation rules (use unit/integration)
Edge cases of business logic (use unit)
Visual styling (use visual regression tools or just manual review)
Every page (only the ones that MUST work)

Roughly: 5-15 E2E tests for the most important paths, not 500 covering every interaction. The trophy / pyramid says: integration tests do the heavy lifting; E2E is just the smoke test on top.

Common gotchas

Flaky tests are worse than no tests. A test that fails 5% of the time will get retried by everyone until it goes away. Investigate every flaky test; fix root cause; never .skip and forget.
Don’t use waitForTimeout. It’s the canonical flaky-test antipattern. Always wait for explicit conditions.
Hardcoded selectors break on any UI change. Use accessible queries (getByRole, getByLabel) wherever possible.
Auth setup is the slow part. Logging in via UI in every test wastes minutes. Set up auth once, save the storage state, reuse across tests.
Race conditions between actions and assertions. A button click that triggers a 500ms API call — the next line of test code can run before the UI updates. Auto-waiting assertions help; explicit waitForResponse is sometimes needed.
CI is slower than local. A test that runs in 8 seconds locally can take 30 seconds in CI. Set generous timeouts in CI config; don’t optimize for local speed at the expense of CI reliability.
Mobile vs desktop matters. A test that passes in Chrome desktop may fail on iPhone Safari. Either test in multiple browsers/viewports or accept the gap.
Test data leaks between runs. A test that creates a user with test@example.com and doesn’t clean up will break the next run. Use unique-per-run emails (test-${Date.now()}@example.com) or full cleanup.
Don’t run E2E against production. Real Stripe charges. Real emails sent. Real data corrupted. Always run against staging or preview.
The Vercel preview URL takes time to be ready. A naive “run E2E after PR” can run before Vercel’s deploy is live. Use the Vercel webhook signal or poll the URL until 200.
Cookies and localStorage persist between tests in the same context. Use a fresh context per test, or explicitly clear in beforeEach.
Animations break tests. A modal that takes 300ms to fade in causes assertions to fire before it’s clickable. Either set prefers-reduced-motion: reduce in tests or wait for the animation to complete.
page.fill() doesn’t trigger React’s onChange in all cases. For some controlled inputs, use page.type() or page.press() to dispatch real keyboard events.
CSS shadow DOM is a special problem. Web components inside shadow roots aren’t reachable by normal selectors. Playwright handles many cases automatically; some need explicit shadow piercing.
File upload tests need real file paths. page.setInputFiles("path/to/file.png") requires the file to exist on disk in the test environment.
Screenshots fail with hashed asset URLs in alt text. A screenshot comparing against a baseline can be fragile if any visible content includes a hash that changes each deploy.
Print errors and screenshots on failure. Configure Playwright (screenshot: 'only-on-failure', video: 'retain-on-failure', trace: 'retain-on-failure') so you have artifacts to debug failures.
Parallel E2E hits database concurrency limits. Each test connecting to the same Supabase project + same user can step on each other. Use per-test data isolation or run a subset serially.
Headless vs headed browsers behave slightly differently. Some things only work headed (some animations, devtools-required APIs). Always run headless in CI for speed.
localhost:3000 vs 127.0.0.1:3000 — cookies aren’t shared. A test that visits one and then the other may see no auth cookie. Pick one consistently.
Maintenance burden grows with test count. Every UI change requires updating selectors and flows. Keep the E2E suite small and focused on critical paths.
Browser context vs page. A BrowserContext is like a fresh browser session. Each test should get its own context (Playwright does this by default). Within one context, multiple page objects can run (multi-tab tests).
Time zones and locales differ in CI. Tests that check “today’s date” can fail if the CI runner is in a different timezone. Set locale and timezone explicitly in the test config.
Don’t write E2E tests for bugs. Write a unit/integration test for the underlying logic. Use E2E only for “user journey works end-to-end” — not for “this specific edge case in business logic.”
CI parallelism is a math problem. 4 workers running 12 tests at 5s each isn’t 60s wall-clock — overhead, retries, and uneven test durations make it 90s+. Profile before assuming.

When to start writing E2E tests

For a brand-new prototype: skip E2E. You’ll throw it all away anyway.

For a project you’ll keep: as soon as you have ONE critical path that absolutely must work (login + first action), write that E2E test. From then, add an E2E test whenever you introduce a new critical path.

Most production webapps end up with 10-50 E2E tests covering the top user journeys. Not 500. Resist the urge to E2E everything.

Sources

Playwright docs — canonical reference
Cypress docs
Playwright vs Cypress comparison
Kent C. Dodds — How to use React Testing Library and Cypress
web.dev — E2E testing best practices — broader Google-published guidance
Vercel — E2E testing with Playwright

Tech & AI, Explained

Explorer

end-to-end-tests