🇺🇸 USA · Cloudflare Workers AI

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Run AI inference at the edge — anywhere Cloudflare’s global network reaches. Llama / Mistral / Whisper served from data centres within milliseconds of your users worldwide. Tightly integrated with the Cloudflare developer platform.


Front-matter facts

FieldValue
VendorCloudflare Inc (San Francisco, USA)
Country / origin🇺🇸 USA
Recommended for Australian users?✅ Yes — Cloudflare has multiple AUS edge locations; very low-latency for AUS users
Privacy summaryNo training on customer data; data routed through Cloudflare global network; per-model privacy posture
Free tierYes — generous free tier per day
Paid tiersPay-per-request beyond free tier; bundled with Workers Paid plan US$5/month
First releasedSeptember 2023
Last reviewed2026-06-26
Official sitehttps://developers.cloudflare.com/workers-ai

What it is

Cloudflare Workers AI is edge-deployed AI inference on Cloudflare’s global network. Where AWS Bedrock / Azure OpenAI / Vertex AI run AI in specific cloud regions (Sydney, US-East, etc.), Workers AI runs inference at Cloudflare’s 300+ edge locations worldwide — including multiple AUS locations.

Why edge inference matters:

  • Lowest possible latency — inference happens at the data centre nearest to the user
  • Tight integration with Cloudflare Workers (serverless functions at edge), R2 (object storage), D1 (SQLite-at-edge), Vectorize (vector DB)
  • No region selection needed — Cloudflare auto-routes
  • Free / cheap for development

Models supported (curated, not full Bedrock-scale):

  • Llama (4 / 5 family)
  • Mistral (various)
  • Whisper (speech-to-text)
  • Stable Diffusion (image gen)
  • BGE / various embedding models
  • Plus 50+ others (browse at developers.cloudflare.com/workers-ai/models)

What you’d use it for

  • Build apps with Cloudflare Workers + AI — natural integration
  • Edge-deployed AI features — chat, transcription, image gen running globally
  • AI features that need lowest latency — voice agents, real-time interactions
  • Cost-effective AI for global apps — generous free tier
  • Privacy-friendly — Cloudflare is a privacy-positive vendor
  • Replace OpenAI / Anthropic for some workloads if open-weight quality is sufficient

When NOT to use Workers AI:

  • For frontier-closed models (Claude Opus, GPT-5, Gemini Pro) — use those vendors
  • For broadest open-weight catalog (Together / Fireworks have more)
  • For AUS-only data residency (Workers AI is global; data may route via non-AUS edges; for AUS-strict use AWS Bedrock Sydney)

How to use from Australia

  1. Cloudflare account (free at cloudflare.com)
  2. Enable Workers AI in dashboard (often default)
  3. Call via Workers (server-side) or REST API:
    // Inside a Cloudflare Worker
    export default {
      async fetch(request, env) {
        const response = await env.AI.run('@cf/meta/llama-4-70b', {
          messages: [{ role: 'user', content: 'Hello' }]
        });
        return new Response(JSON.stringify(response));
      }
    };
  4. AUS Cloudflare edge locations (Sydney, Melbourne, Brisbane, Perth, Adelaide) handle routing automatically

What it costs

Free tier

  • 10,000 neurons / day (Cloudflare’s compute unit for AI)
  • Sufficient for development and small projects

Workers Paid plan — US$5/month

  • 10M requests / month included for Workers
  • 30 million additional neurons / month for AI
  • Plus pay-per-additional usage

Per-model pay-per-token

  • Varies by model
  • Generally cheaper than direct OpenAI / Anthropic for comparable open-weight tasks
  • Llama 4 70B: ~US$0.30-0.40 per million tokens (verify current)

How it compares to alternatives

AspectCloudflare Workers AIAWS BedrockVertex AITogether AI
Edge / global inferenceBest (300+ locations)Per-regionPer-regionPer-region
Free tierMost generousLimitedLimitedSign-up credit
Cloudflare Workers integrationNativeManualManualManual
Frontier closed modelsNoneClaudeGeminiNone
Open-weight catalogCurated (~50)CuratedBroad (Model Garden)Broadest
AUS data residencyGlobal edge incl AUSYes (Sydney)Yes (Sydney+Melbourne)Limited
Best forCloudflare-stack + global edge + cheapAWS shops + AUS residencyGCP shops + AUS residencyOpen-weight production

For developers in the Cloudflare ecosystem building global apps, Workers AI is the natural choice.


Privacy / data handling

  • No training on customer data — committed
  • Data routed through Cloudflare’s network; can be processed at any nearest edge location
  • Cloudflare has a strong privacy reputation overall
  • For strict AUS-only data residency, AWS Bedrock Sydney is the stronger choice
  • Cloudflare’s AI Gateway (separate product) can route to multiple AI providers with central observability

Recent changes

  • 2026: Llama 5 family + expanded multimodal models
  • 2025: AI Gateway matured (companion product)
  • 2024: Model catalog expanded; Vectorize integration
  • September 2023: Workers AI launched

Gotchas

  • Neuron pricing model is unique to Cloudflare — different from per-token pricing elsewhere; modelling cost requires understanding their unit
  • Edge global = not single-region — for strict AUS data residency, use Bedrock Sydney instead
  • Model catalog smaller than AWS / Together — for niche models, check availability
  • Best paired with full Cloudflare developer platform (Workers + R2 + D1 + Vectorize + Pages) for tight integration
  • For high-volume production at frontier-model quality, Anthropic / OpenAI direct often still preferred

See also


Sources