Serverless functions

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: Backend code that doesn’t live on a long-running server — it sleeps until a request arrives, the platform spins it up just-in-time, runs it, and shuts it back down — so you pay (and configure) for executions, not for a server.

In plain English

The traditional backend model is: rent a server, install your code on it, leave it running 24/7. The server consumes electricity, RAM, and CPU even when nobody’s using your app. You pay the same whether your service handles 1 request a day or 1 million.

Serverless functions flip that model. You don’t rent a server. You upload a single function — a chunk of code that handles one specific job — to a platform like Vercel, AWS Lambda, Netlify, or Cloudflare Workers. The platform stores it.

When a request arrives, the platform:

Spins up a runtime (a tiny container or V8 isolate) with your code in it
Runs your function against the request
Returns the response
Either keeps the runtime warm for the next request, or shuts it down after a few seconds of idle

You pay only for the time your function actually ran — measured in milliseconds and the memory it used. When no requests come in, you pay nothing. When 10,000 requests arrive simultaneously, the platform spins up 10,000 instances in parallel and bills you for the actual work done.

The name “serverless” is misleading — there ARE servers. You just don’t see them, don’t manage them, don’t pay for them when idle. The platform abstracts the server away.

For the kinds of webapps George builds, every “API route” in Next.js, every “Vercel function,” every “server action” is a serverless function under the hood. It’s the dominant backend hosting model in 2026.

Why it matters

Three concrete reasons serverless dominates the modern stack:

Cost. A side project with no users costs $0 t or u n . E v e na t m o d es tt r a f f i c, cos t ss t a y t in y . C o m p a r e t o a$ 5/month VPS that runs 24/7 even when nobody visits.
Scaling is automatic. A traffic spike that would crush a fixed server gets absorbed: the platform spins up more function instances in parallel. You don’t configure auto-scaling; it just happens.
Operations near zero. No OS patches, no security updates to the runtime, no firewall rules, no log rotation. The platform handles all of it. You write code and push.

The trade-offs: cold starts, time limits on each invocation, no persistent in-memory state, harder local development, vendor lock-in for some features. For most webapps these are minor; for some workloads they’re showstoppers.

The serverless mental shift

Coming from “I run a server” thinking, serverless requires some mental rewiring:

Server-based	Serverless
Process runs forever	Process exists only during a request
In-memory cache survives across requests	In-memory cache may not — new instance any time
Open one DB connection at startup, reuse it	Each cold instance opens its own connection (use a pooler!)
Background tasks run inside the process	Background tasks need a separate queue/cron
You pay per server-hour	You pay per millisecond of actual execution
You configure CPU/memory	You configure memory; CPU scales with it
Logs persist on disk	Logs stream to a central log system
Crashes mean downtime	Crashes affect one request; next request starts fresh

The biggest mental shift: assume nothing about state between requests. Module-level variables, in-memory caches, file system writes — none of them are reliable. Treat each request like a fresh process (because it might be).

A concrete example: a Vercel function

In Next.js’s App Router, a serverless function is just a route.ts file:

// app/api/posts/route.ts
import { NextRequest, NextResponse } from "next/server";
import { z } from "zod";
 
const PostSchema = z.object({
  title: z.string().min(1),
  body: z.string(),
});
 
export async function POST(req: NextRequest) {
  const data = PostSchema.parse(await req.json());
 
  const post = await db.posts.create({
    data: {
      title: data.title,
      body: data.body,
      authorId: req.headers.get("x-user-id"),
    },
  });
 
  return NextResponse.json(post, { status: 201 });
}

You write this file. You git push. Vercel builds it into a serverless function. When POST /api/posts arrives at your domain, Vercel runs this code in an isolated container, returns the response, and is done.

No npm start. No process you manage. No port to bind. The platform handles all of it.

The function lifecycle (cold vs warm starts)

When a request arrives, one of two things happens:

Cold start

No instance of your function is warm. The platform must:

Allocate a container (or isolate)
Load your code + dependencies
Run initialization (top-level imports, DB clients, etc.)
Run your handler

This can add 50ms to 2000ms depending on:

Runtime (V8 isolate: ~5ms; Node container: 200–800ms; Python: 500–1500ms; Java: 1000–3000ms)
Code size (heavy npm deps slow this)
What you do at module scope (loading large files, opening connections)

Warm start

An instance is already loaded. The platform routes the new request to it. Response in single-digit milliseconds (plus your actual work).

A function typically stays warm for ~5-15 minutes of idle before being recycled. So if your app has steady traffic, most requests are warm. If it’s bursty (overnight quiet, morning rush), users may pay for cold starts.

Modern mitigations:

Vercel Fluid Compute (default since April 2025) — one function instance handles many concurrent requests. Far fewer cold starts.
AWS Lambda SnapStart — pre-snapshotted JVM/Node instances; cold start drops to ~hundred ms.
Cloudflare Workers — V8 isolates have ~5ms cold starts; effectively no cold-start problem.
Provisioned concurrency — pay for a baseline of warm instances always ready.

For the Bible Quest stack (Vercel + Fluid Compute), cold starts are usually invisible.

The platforms — who does what

Platform	Runtime	Cold start	Time limit	Best for
Vercel Functions	Node.js / Bun / Python (newer)	Fast (Fluid)	10s (Hobby), 300s (Pro), 800s (custom)	Next.js apps, webapps
AWS Lambda	Node, Python, Go, Java, Ruby, .NET, custom	100ms–2s	900s (15 min)	The original, used by everyone
Cloudflare Workers	V8 isolate (JS/WASM)	<5ms	30s (default), longer paid	Edge, ultra-fast, global
Netlify Functions	Node	Similar to Lambda (it IS Lambda underneath)	10s (free), 26s (Pro)	Sites + occasional functions
Google Cloud Functions	Node, Python, Go, Java, Ruby, .NET	Similar to Lambda	540s (Gen 1), 3600s (Gen 2)	GCP-centric stacks
Azure Functions	Multi-runtime	Similar	10 min (Consumption), 60+ min (Premium)	.NET-heavy / Microsoft stacks
Fly Machines	Container-based	Slower than isolates	Long-running	Hybrid: serverless feel, full Linux underneath
AWS Fargate	Container	Slower	Long-running	Workloads that need full container, less so “real serverless”

For George’s stack: Vercel Functions are the default. They ARE the backend.

What lives well on serverless — and what doesn’t

Great fits

HTTP request handlers (API routes, REST endpoints)
Webhook receivers
Auth flows
Image processing (resize, watermark)
LLM API proxies
Cron jobs that fire occasionally
Email send triggers
Form submissions
Scheduled report generation

Bad fits

Persistent connections (websockets, long-poll) — cap on duration
Long-running computations (video transcoding, ML training) — time limits
Stateful workloads (in-memory caching at scale, sticky sessions) — no shared state
Workloads that need a static IP — instances rotate
Anything requiring more memory than the platform allows (Vercel: 3008MB default, 10GB max)

For “bad fits,” reach for managed services (Inngest, Trigger.dev for jobs; Pusher/Ably for realtime; Modal/Runpod for ML).

Memory and CPU — the bundled dial

In most serverless platforms, you don’t pick CPU separately. You pick memory, and CPU scales proportionally.

Vercel: 256MB (default) to 10GB
Lambda: 128MB to 10240MB
Cloudflare Workers: fixed 128MB

Pricing is roughly proportional to memory × time. Doubling memory doubles the per-second cost, but if the doubled memory makes the function complete in half the time, total cost is the same — and latency is better.

For most webapp handlers, default settings are fine. For image processing or heavy work, raise memory to make execution faster.

Connection pooling — the recurring trap

Serverless instances are independent. Each cold instance opens its own database connection. A traffic spike that scales to 1000 concurrent instances opens 1000 connections — which most databases can’t tolerate.

The standard solution: a connection pooler in front of the database.

Supabase ships with PgBouncer built-in. Use the pooler URL (port 6543) for serverless code, the direct URL (port 5432) for long-running connections.
Neon has a separate “pooled connection” endpoint.
Cloudflare Hyperdrive acts as a connection pooler for any Postgres database, optimized for Workers.
Prisma Data Proxy / Accelerate can pool connections behind a managed proxy.

Whichever path, your serverless code should never open direct unpooled connections to Postgres.

Cold start mitigation strategies

If cold starts are hurting you:

Reduce bundle size. Tree-shake aggressively. Avoid bundling huge libraries you only need conditionally. Vercel and Lambda both report bundle sizes; aim for under 10MB unzipped.
Lazy-load expensive imports. Don’t import a giant SDK at module scope if only one route uses it. Import inside the handler.
Use edge runtime where possible. V8 isolates start in ~5ms vs Node containers in 200–800ms. Trade-off: limited APIs.
Warm the function. A cron job that pings critical endpoints every minute keeps them warm. Hacky but works.
Use Vercel Fluid Compute / Lambda SnapStart. Newer compute models that reduce cold starts dramatically. Often opt-in.
Provisioned concurrency. Pay to keep N instances always warm. Lambda, Vercel “Always Allocated” memory. Costs add up.

Common gotchas

Module-level code runs on every cold start. Heavy imports, file reads, network calls at module scope all add to cold-start latency. Defer to inside the handler when possible.
In-memory state doesn’t persist across instances. A const cache = new Map() at module scope works for the lifetime of one instance — but a parallel instance has its own empty map. Use Redis, KV, or the database for shared state.
Module-level state CAN leak between different users’ requests. On Fluid Compute or Lambda warm starts, one instance may handle requests for User A then User B in sequence. Don’t cache per-user data in module scope.
Filesystem is ephemeral. /tmp is usually writable but disappears between cold starts. Don’t store anything long-term locally.
Each instance opens its own DB connection. Without a pooler, you’ll exhaust the database’s connection limit at the worst moment (a traffic spike).
Time limits are real. A long-running LLM call, video processing, or batch job can exceed the limit. Plan: break into smaller steps, queue background jobs, or use a different runtime.
Logs need to be structured to be useful. console.log works but produces unsearchable text. Use a structured logger (pino) and aggregate logs in a tool (Vercel Logs, Datadog, Better Stack).
Errors should be caught, not crashed. An unhandled exception terminates the function and returns a generic 500. Wrap handlers in try/catch and surface meaningful errors.
process.env.X is undefined silently if you forgot to set it. Always validate critical env vars at startup. Crash loudly rather than serving misbehaving requests.
HTTP request timeouts default to none. A fetch call to a hung upstream service will run until the function’s time limit. Always set AbortSignal.timeout().
The function billed time is wall-clock, not CPU time. An await waiting for a slow API call costs you. Some platforms (Vercel Fluid) bill differently when the function is waiting; check the docs.
Streaming responses need careful framework support. Returning a stream from a handler works in some runtimes but not all. Server-Sent Events (SSE) work well on Vercel; raw HTTP/2 streams need more care.
Local dev != production environment. npm run dev runs your code in your laptop’s Node; production runs in a container with different system libs, networking, env vars. Always test against a preview deploy before relying on production behavior.
Some npm packages don’t work serverless. Headless Chrome, FFmpeg with custom codecs, anything needing root filesystem write. Either work around (use a serverless-compatible variant) or move that workload to a long-running runtime.
Background jobs need a separate system. A serverless handler that returns 200 after starting an async task gets killed mid-task. Use a queue (Inngest, Trigger.dev, Vercel Queues, Supabase pg_cron + a poller).
Concurrency limits exist. Vercel: per-team and per-region limits. Lambda: per-account concurrency reservations. A burst of 10,000 requests might be throttled even if your code can handle it.
Cold-start time isn’t just a UX concern. A slow cold start can push your function past a downstream timeout (e.g. Stripe’s webhook timeout). Optimize aggressively.
Bundle splitting per-route happens by default. Next.js splits your routes into separate bundles. Each route’s cold start only includes its own deps. Don’t accidentally import a huge global util that pulls in everything.
API Gateway / proxy layers can add their own timeouts. Vercel’s edge proxy times out at 60 seconds even if your function allows longer. Match your function timeout to the proxy timeout to avoid mysterious cutoffs.
Cost surprises are easy. A misconfigured loop or an attacker hammering your endpoint can rack up millions of invocations. Set spending caps where the provider supports them; rate-limit at the edge.
Vendor lock-in is real. Vercel functions, Lambda, Cloudflare Workers all have different APIs, different bindings. Code written for one rarely runs unmodified on another. Use abstractions (Hono, Web Standard APIs) to reduce lock-in if portability matters.

When to use serverless

Any modern webapp with bursty or moderate traffic
Side projects, prototypes, MVPs (free tiers cover it)
API endpoints serving < 30s requests each
Webhook receivers
Cron-style scheduled tasks
Anything event-driven

When NOT to use serverless

Persistent connections (websockets at scale, MQTT brokers)
Heavy CPU work over long periods (video transcoding, batch ML)
Workloads that need predictable, low-latency cold-start guarantees beyond what providers offer
Workloads that consume more than ~10GB memory
Cost-sensitive massive-traffic workloads where dedicated servers would be cheaper at scale

Sources

Vercel Functions docs
AWS Lambda docs — the original, still the most-deployed
Vercel Fluid Compute — the modern Vercel runtime
Cloudflare Workers docs
Martin Fowler — Serverless architectures — broader essay
Anatomy of a Lambda cold start (AWS re:Invent talks)

Tech & AI, Explained

Explorer

serverless-functions