Serverless functions

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: Backend code that doesn’t live on a long-running server — it sleeps until a request arrives, the platform spins it up just-in-time, runs it, and shuts it back down — so you pay (and configure) for executions, not for a server.


In plain English

The traditional backend model is: rent a server, install your code on it, leave it running 24/7. The server consumes electricity, RAM, and CPU even when nobody’s using your app. You pay the same whether your service handles 1 request a day or 1 million.

Serverless functions flip that model. You don’t rent a server. You upload a single function — a chunk of code that handles one specific job — to a platform like Vercel, AWS Lambda, Netlify, or Cloudflare Workers. The platform stores it.

When a request arrives, the platform:

  1. Spins up a runtime (a tiny container or V8 isolate) with your code in it
  2. Runs your function against the request
  3. Returns the response
  4. Either keeps the runtime warm for the next request, or shuts it down after a few seconds of idle

You pay only for the time your function actually ran — measured in milliseconds and the memory it used. When no requests come in, you pay nothing. When 10,000 requests arrive simultaneously, the platform spins up 10,000 instances in parallel and bills you for the actual work done.

The name “serverless” is misleading — there ARE servers. You just don’t see them, don’t manage them, don’t pay for them when idle. The platform abstracts the server away.

For the kinds of webapps George builds, every “API route” in Next.js, every “Vercel function,” every “server action” is a serverless function under the hood. It’s the dominant backend hosting model in 2026.


Why it matters

Three concrete reasons serverless dominates the modern stack:

  1. Cost. A side project with no users costs 5/month VPS that runs 24/7 even when nobody visits.

  2. Scaling is automatic. A traffic spike that would crush a fixed server gets absorbed: the platform spins up more function instances in parallel. You don’t configure auto-scaling; it just happens.

  3. Operations near zero. No OS patches, no security updates to the runtime, no firewall rules, no log rotation. The platform handles all of it. You write code and push.

The trade-offs: cold starts, time limits on each invocation, no persistent in-memory state, harder local development, vendor lock-in for some features. For most webapps these are minor; for some workloads they’re showstoppers.


The serverless mental shift

Coming from “I run a server” thinking, serverless requires some mental rewiring:

Server-basedServerless
Process runs foreverProcess exists only during a request
In-memory cache survives across requestsIn-memory cache may not — new instance any time
Open one DB connection at startup, reuse itEach cold instance opens its own connection (use a pooler!)
Background tasks run inside the processBackground tasks need a separate queue/cron
You pay per server-hourYou pay per millisecond of actual execution
You configure CPU/memoryYou configure memory; CPU scales with it
Logs persist on diskLogs stream to a central log system
Crashes mean downtimeCrashes affect one request; next request starts fresh

The biggest mental shift: assume nothing about state between requests. Module-level variables, in-memory caches, file system writes — none of them are reliable. Treat each request like a fresh process (because it might be).


A concrete example: a Vercel function

In Next.js’s App Router, a serverless function is just a route.ts file:

// app/api/posts/route.ts
import { NextRequest, NextResponse } from "next/server";
import { z } from "zod";
 
const PostSchema = z.object({
  title: z.string().min(1),
  body: z.string(),
});
 
export async function POST(req: NextRequest) {
  const data = PostSchema.parse(await req.json());
 
  const post = await db.posts.create({
    data: {
      title: data.title,
      body: data.body,
      authorId: req.headers.get("x-user-id"),
    },
  });
 
  return NextResponse.json(post, { status: 201 });
}

You write this file. You git push. Vercel builds it into a serverless function. When POST /api/posts arrives at your domain, Vercel runs this code in an isolated container, returns the response, and is done.

No npm start. No process you manage. No port to bind. The platform handles all of it.


The function lifecycle (cold vs warm starts)

When a request arrives, one of two things happens:

Cold start

No instance of your function is warm. The platform must:

  1. Allocate a container (or isolate)
  2. Load your code + dependencies
  3. Run initialization (top-level imports, DB clients, etc.)
  4. Run your handler

This can add 50ms to 2000ms depending on:

  • Runtime (V8 isolate: ~5ms; Node container: 200–800ms; Python: 500–1500ms; Java: 1000–3000ms)
  • Code size (heavy npm deps slow this)
  • What you do at module scope (loading large files, opening connections)

Warm start

An instance is already loaded. The platform routes the new request to it. Response in single-digit milliseconds (plus your actual work).

A function typically stays warm for ~5-15 minutes of idle before being recycled. So if your app has steady traffic, most requests are warm. If it’s bursty (overnight quiet, morning rush), users may pay for cold starts.

Modern mitigations:

  • Vercel Fluid Compute (default since April 2025) — one function instance handles many concurrent requests. Far fewer cold starts.
  • AWS Lambda SnapStart — pre-snapshotted JVM/Node instances; cold start drops to ~hundred ms.
  • Cloudflare Workers — V8 isolates have ~5ms cold starts; effectively no cold-start problem.
  • Provisioned concurrency — pay for a baseline of warm instances always ready.

For the Bible Quest stack (Vercel + Fluid Compute), cold starts are usually invisible.


The platforms — who does what

PlatformRuntimeCold startTime limitBest for
Vercel FunctionsNode.js / Bun / Python (newer)Fast (Fluid)10s (Hobby), 300s (Pro), 800s (custom)Next.js apps, webapps
AWS LambdaNode, Python, Go, Java, Ruby, .NET, custom100ms–2s900s (15 min)The original, used by everyone
Cloudflare WorkersV8 isolate (JS/WASM)<5ms30s (default), longer paidEdge, ultra-fast, global
Netlify FunctionsNodeSimilar to Lambda (it IS Lambda underneath)10s (free), 26s (Pro)Sites + occasional functions
Google Cloud FunctionsNode, Python, Go, Java, Ruby, .NETSimilar to Lambda540s (Gen 1), 3600s (Gen 2)GCP-centric stacks
Azure FunctionsMulti-runtimeSimilar10 min (Consumption), 60+ min (Premium).NET-heavy / Microsoft stacks
Fly MachinesContainer-basedSlower than isolatesLong-runningHybrid: serverless feel, full Linux underneath
AWS FargateContainerSlowerLong-runningWorkloads that need full container, less so “real serverless”

For George’s stack: Vercel Functions are the default. They ARE the backend.


What lives well on serverless — and what doesn’t

Great fits

  • HTTP request handlers (API routes, REST endpoints)
  • Webhook receivers
  • Auth flows
  • Image processing (resize, watermark)
  • LLM API proxies
  • Cron jobs that fire occasionally
  • Email send triggers
  • Form submissions
  • Scheduled report generation

Bad fits

  • Persistent connections (websockets, long-poll) — cap on duration
  • Long-running computations (video transcoding, ML training) — time limits
  • Stateful workloads (in-memory caching at scale, sticky sessions) — no shared state
  • Workloads that need a static IP — instances rotate
  • Anything requiring more memory than the platform allows (Vercel: 3008MB default, 10GB max)

For “bad fits,” reach for managed services (Inngest, Trigger.dev for jobs; Pusher/Ably for realtime; Modal/Runpod for ML).


Memory and CPU — the bundled dial

In most serverless platforms, you don’t pick CPU separately. You pick memory, and CPU scales proportionally.

  • Vercel: 256MB (default) to 10GB
  • Lambda: 128MB to 10240MB
  • Cloudflare Workers: fixed 128MB

Pricing is roughly proportional to memory × time. Doubling memory doubles the per-second cost, but if the doubled memory makes the function complete in half the time, total cost is the same — and latency is better.

For most webapp handlers, default settings are fine. For image processing or heavy work, raise memory to make execution faster.


Connection pooling — the recurring trap

Serverless instances are independent. Each cold instance opens its own database connection. A traffic spike that scales to 1000 concurrent instances opens 1000 connections — which most databases can’t tolerate.

The standard solution: a connection pooler in front of the database.

  • Supabase ships with PgBouncer built-in. Use the pooler URL (port 6543) for serverless code, the direct URL (port 5432) for long-running connections.
  • Neon has a separate “pooled connection” endpoint.
  • Cloudflare Hyperdrive acts as a connection pooler for any Postgres database, optimized for Workers.
  • Prisma Data Proxy / Accelerate can pool connections behind a managed proxy.

Whichever path, your serverless code should never open direct unpooled connections to Postgres.


Cold start mitigation strategies

If cold starts are hurting you:

  1. Reduce bundle size. Tree-shake aggressively. Avoid bundling huge libraries you only need conditionally. Vercel and Lambda both report bundle sizes; aim for under 10MB unzipped.

  2. Lazy-load expensive imports. Don’t import a giant SDK at module scope if only one route uses it. Import inside the handler.

  3. Use edge runtime where possible. V8 isolates start in ~5ms vs Node containers in 200–800ms. Trade-off: limited APIs.

  4. Warm the function. A cron job that pings critical endpoints every minute keeps them warm. Hacky but works.

  5. Use Vercel Fluid Compute / Lambda SnapStart. Newer compute models that reduce cold starts dramatically. Often opt-in.

  6. Provisioned concurrency. Pay to keep N instances always warm. Lambda, Vercel “Always Allocated” memory. Costs add up.


Common gotchas

  • Module-level code runs on every cold start. Heavy imports, file reads, network calls at module scope all add to cold-start latency. Defer to inside the handler when possible.

  • In-memory state doesn’t persist across instances. A const cache = new Map() at module scope works for the lifetime of one instance — but a parallel instance has its own empty map. Use Redis, KV, or the database for shared state.

  • Module-level state CAN leak between different users’ requests. On Fluid Compute or Lambda warm starts, one instance may handle requests for User A then User B in sequence. Don’t cache per-user data in module scope.

  • Filesystem is ephemeral. /tmp is usually writable but disappears between cold starts. Don’t store anything long-term locally.

  • Each instance opens its own DB connection. Without a pooler, you’ll exhaust the database’s connection limit at the worst moment (a traffic spike).

  • Time limits are real. A long-running LLM call, video processing, or batch job can exceed the limit. Plan: break into smaller steps, queue background jobs, or use a different runtime.

  • Logs need to be structured to be useful. console.log works but produces unsearchable text. Use a structured logger (pino) and aggregate logs in a tool (Vercel Logs, Datadog, Better Stack).

  • Errors should be caught, not crashed. An unhandled exception terminates the function and returns a generic 500. Wrap handlers in try/catch and surface meaningful errors.

  • process.env.X is undefined silently if you forgot to set it. Always validate critical env vars at startup. Crash loudly rather than serving misbehaving requests.

  • HTTP request timeouts default to none. A fetch call to a hung upstream service will run until the function’s time limit. Always set AbortSignal.timeout().

  • The function billed time is wall-clock, not CPU time. An await waiting for a slow API call costs you. Some platforms (Vercel Fluid) bill differently when the function is waiting; check the docs.

  • Streaming responses need careful framework support. Returning a stream from a handler works in some runtimes but not all. Server-Sent Events (SSE) work well on Vercel; raw HTTP/2 streams need more care.

  • Local dev != production environment. npm run dev runs your code in your laptop’s Node; production runs in a container with different system libs, networking, env vars. Always test against a preview deploy before relying on production behavior.

  • Some npm packages don’t work serverless. Headless Chrome, FFmpeg with custom codecs, anything needing root filesystem write. Either work around (use a serverless-compatible variant) or move that workload to a long-running runtime.

  • Background jobs need a separate system. A serverless handler that returns 200 after starting an async task gets killed mid-task. Use a queue (Inngest, Trigger.dev, Vercel Queues, Supabase pg_cron + a poller).

  • Concurrency limits exist. Vercel: per-team and per-region limits. Lambda: per-account concurrency reservations. A burst of 10,000 requests might be throttled even if your code can handle it.

  • Cold-start time isn’t just a UX concern. A slow cold start can push your function past a downstream timeout (e.g. Stripe’s webhook timeout). Optimize aggressively.

  • Bundle splitting per-route happens by default. Next.js splits your routes into separate bundles. Each route’s cold start only includes its own deps. Don’t accidentally import a huge global util that pulls in everything.

  • API Gateway / proxy layers can add their own timeouts. Vercel’s edge proxy times out at 60 seconds even if your function allows longer. Match your function timeout to the proxy timeout to avoid mysterious cutoffs.

  • Cost surprises are easy. A misconfigured loop or an attacker hammering your endpoint can rack up millions of invocations. Set spending caps where the provider supports them; rate-limit at the edge.

  • Vendor lock-in is real. Vercel functions, Lambda, Cloudflare Workers all have different APIs, different bindings. Code written for one rarely runs unmodified on another. Use abstractions (Hono, Web Standard APIs) to reduce lock-in if portability matters.


When to use serverless

  • Any modern webapp with bursty or moderate traffic
  • Side projects, prototypes, MVPs (free tiers cover it)
  • API endpoints serving < 30s requests each
  • Webhook receivers
  • Cron-style scheduled tasks
  • Anything event-driven

When NOT to use serverless

  • Persistent connections (websockets at scale, MQTT brokers)
  • Heavy CPU work over long periods (video transcoding, batch ML)
  • Workloads that need predictable, low-latency cold-start guarantees beyond what providers offer
  • Workloads that consume more than ~10GB memory
  • Cost-sensitive massive-traffic workloads where dedicated servers would be cheaper at scale

See also


Sources