Webhooks — deep dive (protocol mechanics)

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: The protocol layer behind webhooks — HMAC signature math, replay protection, retry semantics, the spectrum of signing schemes, and how providers solve the at-least-once delivery problem on top of plain HTTP.


In plain English

Webhooks at the API-design level: “service POSTs to your URL when something happens.” Simple concept. Easy to use. What this entry covers is what’s UNDERNEATH that simple concept — the protocol-level mechanisms that make webhooks trustworthy and reliable in the real world:

  • HMAC signatures — the cryptographic math that proves “this POST really came from the service, not an attacker”
  • Replay protection — preventing an attacker from re-sending an old valid webhook
  • Retry semantics — what happens when your endpoint fails; how providers decide to retry, when to give up
  • Delivery guarantees — why “at least once” is the typical guarantee, never “exactly once”
  • Different signing schemes — Stripe’s, GitHub’s, Standard Webhooks; the math is similar but details differ

If you’re consuming webhooks (Stripe payment events, GitHub PR notifications, Calendly bookings), the application-level webhooks.md is what you need for HOW to write a handler. This entry is for understanding the cryptography and reliability mechanisms so you can:

  • Implement webhooks AS A PROVIDER (e.g., you publish events for OTHER apps to consume)
  • Debug subtle webhook security issues
  • Evaluate webhook providers’ security claims
  • Understand why webhook errors have particular shapes

Why it matters

Three concrete reasons protocol-level webhook knowledge pays off:

  1. Security depends on the signing scheme. A webhook endpoint without signature verification is an OPEN DOOR — anyone who knows the URL can POST fake events. Bad signing implementations (timing-attack vulnerable comparisons, ignoring timestamps) leave subtle holes.

  2. Reliability has hard edges. “At-least-once” delivery means your handler will receive duplicates. Idempotency is mandatory. Retries can flood your server; signing key rotation needs careful staging.

  3. If you ever publish webhooks, you need all of this. Building a service that sends webhooks to customers is non-trivial — signatures, retries, dashboards, monitoring, key rotation, dead-letter handling. Knowing the patterns helps you make informed choices.

The trade-off: this is “deep dive” content. Most webhook consumers can get away with using the provider’s SDK + reading webhooks.md. Reach for this entry when subtle things break or when you’re on the provider side.


The HMAC signature math

The standard scheme for signing webhooks is HMAC-SHA256 — a keyed hash function. The recipe:

signature = HMAC-SHA256(secret_key, payload)
  • secret_key is a long random string shared between the provider and your endpoint (a “signing secret”)
  • payload is the EXACT bytes of the webhook body
  • HMAC-SHA256 is a one-way function that produces a 256-bit (32-byte) digest

The provider sends:

POST /api/webhooks/example HTTP/1.1
X-Signature: sha256=a3b2c1...   (hex-encoded signature)
Content-Type: application/json
 
{"event":"payment.succeeded","amount":1000,"customer":"cus_abc"}

Your handler:

  1. Reads the raw body bytes
  2. Computes HMAC-SHA256(YOUR_SECRET, body)
  3. Compares to the X-Signature header
  4. If they match, the webhook is GENUINE — only someone with the secret could have generated this signature
  5. If they don’t, REJECT — return 400 or 401

The security property: an attacker who doesn’t know the secret CANNOT forge a valid signature for arbitrary payloads, no matter how much sample data they have. HMAC is cryptographically secure.


Why use HMAC instead of just hashing?

A naive scheme: signature = SHA256(secret + payload).

Vulnerable to length extension attacks. SHA-256 (and SHA-1, SHA-512) have an internal state — an attacker can take a known hash + length and APPEND data to compute a new valid hash WITHOUT knowing the secret.

HMAC was designed to defeat this. The construction:

HMAC(K, m) = H((K XOR opad) || H((K XOR ipad) || m))

The double-hashing with inner/outer keys (ipad = 0x36 repeated, opad = 0x5C repeated) immunizes against length extension.

Use a library. Don’t implement HMAC by hand. Node’s crypto.createHmac(), Python’s hmac.new(), Go’s crypto/hmac all do it correctly.


Timing-safe comparison — the subtle trap

After computing the expected signature, you compare it to the provided one. The naive way:

const expected = computeHmac(secret, body);
if (expected === provided) { /* accept */ }

This is VULNERABLE to a timing attack. JavaScript’s === compares strings character-by-character; returns false on the first mismatch. An attacker can measure HOW LONG === takes for various guesses and slowly learn the correct prefix one character at a time.

In practice, this attack is hard over the internet (network jitter dominates timing differences). But it’s theoretically possible and tools EXIST.

The fix: use a constant-time comparison:

import { timingSafeEqual } from "node:crypto";
 
const expected = computeHmac(secret, body);
const valid = timingSafeEqual(
  Buffer.from(expected, "hex"),
  Buffer.from(provided, "hex")
);
if (valid) { /* accept */ }

timingSafeEqual takes the same time regardless of WHERE the comparison fails. Always use it for signature comparison. Always.

Library-provided verification (stripe.webhooks.constructEvent) does this internally. Roll-your-own should explicitly call timing-safe comparison.


Replay attacks and timestamps

Even with a valid signature, an attacker who captures a webhook (e.g., on a compromised network, in old logs) can REPLAY it. The signature is still valid; your handler thinks it’s legitimate.

Defenses:

1. Include a timestamp in the signed payload

Most modern signing schemes (Stripe, Standard Webhooks) prepend a timestamp to the payload before signing:

signed_string = timestamp + "." + payload
signature = HMAC(secret, signed_string)

The provider sends:

X-Signature: t=1719842400,v1=abc123...

Your handler:

  1. Extracts the timestamp t from the header
  2. Checks the timestamp is within a TOLERANCE WINDOW (typically 5 minutes from now)
  3. Reconstructs signed_string = t + "." + body
  4. Verifies HMAC

If the timestamp is too old, REJECT — even if the signature is mathematically valid. An attacker can’t replay a 1-hour-old webhook because the timestamp is past the tolerance.

2. Track received event IDs

Each event has a unique ID. Persist seen IDs in a database. Reject duplicates:

const eventId = req.body.id;
const seen = await db.processedEvents.findById(eventId);
if (seen) return { received: true, deduped: true };
 
await processEvent(req.body);
await db.processedEvents.create({ id: eventId, processedAt: new Date() });

This also handles legitimate retries (see below).

3. Both, in practice

Stripe uses BOTH: timestamp window + event ID. Use both for any webhook system you build.


Retry semantics — what providers do

Every webhook provider has a retry policy. The common shape:

1. Send webhook
2. If response is 2xx → success, done
3. If response is 4xx (not 408 or 429) → permanent failure, don't retry
4. If response is 5xx, 408, 429, or timeout → schedule retry
5. Exponential backoff between retries
6. Give up after N retries (typically 3-72 hours of trying)

Concrete schedules:

ProviderRetry policy
StripeUp to 3 days, exponential backoff (1m, 5m, 30m, 1h, 6h, 12h, etc.)
GitHubUp to 8 attempts over ~7 hours
ShopifyUp to 19 attempts over ~48 hours
SlackUp to 3 attempts within ~30 minutes
TwilioNone by default (you opt in); up to 11 attempts over ~hours

The implications for YOUR endpoint:

  • It WILL be called multiple times for the same event. Plan for this.
  • It must be idempotent. Same event delivered twice produces the same outcome as delivering once.
  • Permanent failures are 4xx (except 408, 429). A 401 means “wrong signature, don’t retry”; the provider correctly gives up.
  • Retry-After header from your endpoint is sometimes honored. Most providers ignore it; some respect it for rate limiting.

Delivery guarantees — at-least-once, NEVER exactly-once

In any distributed system, you can have:

  • At-most-once — may be lost; never duplicated
  • At-least-once — may be duplicated; never lost (unless retries are exhausted)
  • Exactly-once — never lost, never duplicated (the holy grail)

Exactly-once delivery is impossible over an unreliable network. Period. Two-phase commit and other protocols can SIMULATE it, but only by combining at-least-once delivery with idempotent consumers — which is what providers actually do.

Every webhook provider gives you at-least-once. Your handler must be IDEMPOTENT to convert that into exactly-once-EFFECTS.

The idempotency mechanism:

  1. Each event has a stable unique ID (provider-assigned)
  2. Your handler records “I’ve processed event X”
  3. On a duplicate of event X, you skip processing (but return 200)
  4. The provider sees 200 and stops retrying

Without step 2-3, retries cause duplicate work — duplicate orders, duplicate emails, duplicate user creations.


A real signing scheme: Stripe’s

Stripe’s webhook signature scheme is the most-imitated. The header:

Stripe-Signature: t=1719842400,v1=abc123def456...,v0=xyz789...

t is the timestamp (Unix seconds). v1 is the current signing scheme; v0 is a legacy scheme for backwards compatibility.

Verification:

  1. Extract t and v1 from the header
  2. Compute signed_payload = "${t}.${raw_body}"
  3. Compute expected_signature = HMAC-SHA256(webhook_secret, signed_payload)
  4. Compare to v1 using constant-time equality
  5. Verify timestamp is within ±5 minutes
import Stripe from "stripe";
 
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
 
export async function POST(req: NextRequest) {
  const body = await req.text();          // RAW bytes
  const sig = req.headers.get("stripe-signature");
 
  let event: Stripe.Event;
  try {
    event = stripe.webhooks.constructEvent(
      body,
      sig!,
      process.env.STRIPE_WEBHOOK_SECRET!
    );
  } catch (err) {
    return new Response("Invalid signature", { status: 400 });
  }
 
  // Now `event` is verified and typed
  // ... handle event ...
}

stripe.webhooks.constructEvent does timing-safe comparison and timestamp validation internally. Don’t roll your own.


Standard Webhooks — an emerging convention

The webhook ecosystem has long suffered from EVERY provider doing things differently. standardwebhooks.com is a community attempt to standardize:

  • HMAC-SHA256 with timestamps + IDs
  • Standard headers: webhook-id, webhook-timestamp, webhook-signature
  • Versioned signatures (v1,base64-sig)
  • Documented retry / error semantics

Adopting providers in 2026: Svix, Zapier, Hookdeck, several mid-tier services. Stripe and GitHub haven’t switched (legacy compatibility), but new APIs increasingly start with Standard Webhooks.

If you’re BUILDING a webhook system, follow Standard Webhooks. Your consumers will thank you — they can use any compliant library to verify signatures.


Signing key rotation — the lifecycle question

If your secret leaks, you need to rotate it. Providers handle this by:

  1. Letting you create a SECOND active key
  2. Both keys verify signatures during a transition window
  3. New webhooks are signed with the NEW key
  4. After all in-flight webhooks have been delivered, the OLD key is retired

Without this two-key approach, rotation would drop in-flight webhooks that signed with the old key.

For Stripe: rotating involves the dashboard + a brief verification period. For consumers (you), it means: be ready to accept signatures from EITHER of two keys during a rotation.

If you’re a CONSUMER of webhooks: you can rotate the OUTGOING secret (the one the provider stores) by:

  1. Generating a new secret at the provider
  2. Updating both STRIPE_WEBHOOK_SECRET env vars (the new one, the old one if your code supports both during transition)
  3. Deploying
  4. Removing the old secret from the provider

For Bible Quest: keep webhook secrets in Vercel env vars. Rotate annually or on suspicion of compromise.


The “respond fast” requirement

Webhook providers have timeouts. Stripe: 10 seconds. GitHub: 10 seconds. Shopify: 5 seconds. If your endpoint doesn’t respond in time, the provider records a timeout and schedules a retry.

This forces you to defer heavy work:

export async function POST(req: NextRequest) {
  const event = await verifyWebhook(req);
 
  // Quick database write to record receipt
  await db.webhookEvents.create({ id: event.id, type: event.type, payload: event });
 
  // Defer actual processing
  await jobQueue.add({ eventId: event.id });
 
  return new Response("OK", { status: 200 });  // Within 100-500ms
}

The background queue (Inngest, Trigger.dev, Vercel Queues, custom postgres-backed) processes the event later. The webhook handler’s job is to ACCEPT and ENQUEUE quickly.

Without this pattern, slow business logic causes timeouts → retries → more timeouts → eventual giving up. Don’t let your business logic block the ACK.


What makes a webhook system reliable

If you’re designing a webhook PROVIDER, the components:

  1. Event publishing — when something happens, you record an event
  2. Event store — durable storage with each event’s ID, type, payload, timestamp
  3. Delivery queue — async system that handles delivery + retries
  4. Signing layer — HMAC with a per-subscriber secret
  5. Retry policy — exponential backoff with caps
  6. Dead-letter handling — events that fail all retries land somewhere visible (a dashboard, an alert)
  7. Observability — per-event status (delivered, failed, retrying), per-endpoint health
  8. Replay — let users re-trigger a specific event if they need to
  9. Test mode — fake events for development without affecting production

For Bible Quest-scale projects: you’re almost always a CONSUMER of webhooks, not a provider. Building all this is hundreds of engineering hours; using Stripe / GitHub / Supabase / etc. is the answer.

If you ever DO need to publish webhooks: use a managed service (Svix, Hookdeck) rather than building it yourself.


Common gotchas

  • Always read the body as RAW BYTES before parsing. The signature is computed over exact bytes. Parsing first (req.json()) can reorder keys, normalize whitespace, alter encoding. Use await req.text() and then validate before JSON-parsing.

  • timingSafeEqual is non-negotiable for signature comparison. Naive === is theoretically vulnerable to timing attacks.

  • Reject signature mismatches with 4xx, not 5xx. A 401 or 400 signals “permanent failure” — provider stops retrying. 5xx triggers retries, flooding your endpoint with the same bad request.

  • Check timestamps to prevent replay. A valid signature on a 6-hour-old payload from a leaked log is a security hole.

  • Different providers use different header names. Stripe-Signature, X-Hub-Signature-256 (GitHub), X-Shopify-Hmac-Sha256, webhook-signature (Standard Webhooks). Read the docs for each.

  • HMAC needs the ENTIRE body, not parsed JSON. Whitespace, key order, escaped characters — all matter. The provider signed exact bytes.

  • Frameworks may consume the body before you see it. Some Node frameworks (older Express middleware) parse req.json() before your handler runs, so req.body exists but req.text() returns empty. Use raw body extractors. Next.js App Router doesn’t have this issue.

  • Content-Type matters. Some providers send application/x-www-form-urlencoded; some send JSON. Reading raw bytes works regardless, but parsing logic differs.

  • Idempotency tokens vs. event IDs. Some providers use a separate Idempotency-Key header; some bake it into the event payload as id. Both work; pick one and track.

  • Don’t mix dev and prod webhook secrets. Each environment (production, preview, staging) has its OWN secret. Set in Vercel per-environment.

  • Local testing requires a tunnel. Localhost isn’t reachable. Stripe CLI, ngrok, Cloudflare Tunnel. See webhooks.md.

  • A 100 Continue response is HTTP-protocol, not a webhook ack. Providers wait for the full response — usually 2xx — before considering delivery successful.

  • Webhook handlers MUST be idempotent. A retry will arrive with the same event ID. Process once; ignore (return 200) on duplicates.

  • HMAC alone doesn’t authenticate the SOURCE; it authenticates POSSESSION of the secret. If your secret leaks, attackers can sign arbitrary payloads. Treat the secret like a password.

  • Don’t log signatures + payloads together. If a log file leaks, an attacker could replay signed events. Log event IDs and outcomes; redact signatures.

  • Stripe rotates the Stripe-Signature algorithm version. Library handles it; if you roll your own, monitor for v0 vs v1 vs future schemes.

  • The provider’s IP can change. Some providers (Slack, Twilio) publish IP allowlists. Most modern providers (Stripe, GitHub) use rotating cloud IPs and rely on signature verification instead.

  • Replay-attack windows tighter than 5 minutes are risky. Clock skew between provider and your server can exceed 30 seconds. 5 minutes is a reasonable default; tighter requires NTP sync.

  • HMAC-SHA1 (used by some legacy GitHub-style schemes) is technically still safe but discouraged. Newer schemes use SHA-256 or stronger.

  • A 404 from your endpoint is a provider mistake. Means your URL is wrong or the endpoint doesn’t exist. Verify the URL configured in the provider’s dashboard.

  • Some providers send X-Webhook-ID in addition to event ID. The webhook-id is per-DELIVERY (changes on retry); the event-id is per-EVENT (stays same on retry). Use the event-id for idempotency.

  • Content-Length mismatch breaks signature verification. If your body parser strips trailing whitespace or normalizes encoding, the bytes don’t match what was signed. Be careful.

  • Don’t process events synchronously across many subscribers. If you have a fan-out (one event → 5 webhooks → 5 different services), each subscriber should get the event in PARALLEL, not sequentially. Otherwise one slow subscriber blocks the rest.

  • Webhook test events from providers’ dashboards SHOULD have valid signatures. If they don’t (Stripe’s “Send test webhook” did this in 2023), report to the provider.

  • Replay protection alone doesn’t prevent ACTIVE-network attacks. An attacker who can modify packets in real time can intercept and alter a webhook before it reaches you. TLS prevents this — always require HTTPS for webhook endpoints.

  • Open redirect endpoints are a webhook risk. If your webhook handler responds with a redirect, providers may follow it, leaking signatures or attempting auth elsewhere. Always return JSON or empty bodies.

  • Customer-supplied webhook URLs are an SSRF risk. If you let users register webhook URLs (e.g., “Slack-style integrations”), validate the URL doesn’t resolve to internal IPs (192.168.x.x, 10.x.x.x, 127.0.0.1). Otherwise an attacker can use your service to probe internal networks.

  • Stripe’s webhook documentation is the gold standard. Read it once even if you don’t use Stripe. Patterns transfer to other providers.

  • Webhook payloads can be huge. Stripe events can be 100KB+. Plan storage accordingly. Some providers truncate or paginate.

  • AI-generated handlers often skip timing-safe comparison. Always review AI output for === on signatures. Replace with timingSafeEqual.

  • Don’t use a single global webhook secret for many subscribers. If you’re publishing webhooks to many consumers, give each consumer their own secret. If one leaks, only one consumer is affected.

  • HMAC keys should be ≥ 32 bytes of entropy. Don’t use a memorable string. Generate via crypto.randomBytes(32).toString("hex").

  • Don’t trust webhooks for security-critical state changes alone. “User paid” should be verified via a follow-up API call (fetch the latest charge status from Stripe), not just trusting the webhook. Belt and suspenders.

  • Webhook misdelivery (event sent to wrong endpoint) is rare but exists. When debugging mysterious payloads, check what URL the provider thinks it’s sending to.


See also


Sources