🇺🇸 USA · Cloudflare Workers AI

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Run AI inference at the edge — anywhere Cloudflare’s global network reaches. Llama / Mistral / Whisper served from data centres within milliseconds of your users worldwide. Tightly integrated with the Cloudflare developer platform.

Front-matter facts

Field	Value
Vendor	Cloudflare Inc (San Francisco, USA)
Country / origin	🇺🇸 USA
Recommended for Australian users?	✅ Yes — Cloudflare has multiple AUS edge locations; very low-latency for AUS users
Privacy summary	No training on customer data; data routed through Cloudflare global network; per-model privacy posture
Free tier	Yes — generous free tier per day
Paid tiers	Pay-per-request beyond free tier; bundled with Workers Paid plan US$5/month
First released	September 2023
Last reviewed	2026-06-26
Official site	https://developers.cloudflare.com/workers-ai

What it is

Cloudflare Workers AI is edge-deployed AI inference on Cloudflare’s global network. Where AWS Bedrock / Azure OpenAI / Vertex AI run AI in specific cloud regions (Sydney, US-East, etc.), Workers AI runs inference at Cloudflare’s 300+ edge locations worldwide — including multiple AUS locations.

Why edge inference matters:

Lowest possible latency — inference happens at the data centre nearest to the user
Tight integration with Cloudflare Workers (serverless functions at edge), R2 (object storage), D1 (SQLite-at-edge), Vectorize (vector DB)
No region selection needed — Cloudflare auto-routes
Free / cheap for development

Models supported (curated, not full Bedrock-scale):

Llama (4 / 5 family)
Mistral (various)
Whisper (speech-to-text)
Stable Diffusion (image gen)
BGE / various embedding models
Plus 50+ others (browse at developers.cloudflare.com/workers-ai/models)

What you’d use it for

Build apps with Cloudflare Workers + AI — natural integration
Edge-deployed AI features — chat, transcription, image gen running globally
AI features that need lowest latency — voice agents, real-time interactions
Cost-effective AI for global apps — generous free tier
Privacy-friendly — Cloudflare is a privacy-positive vendor
Replace OpenAI / Anthropic for some workloads if open-weight quality is sufficient

When NOT to use Workers AI:

For frontier-closed models (Claude Opus, GPT-5, Gemini Pro) — use those vendors
For broadest open-weight catalog (Together / Fireworks have more)
For AUS-only data residency (Workers AI is global; data may route via non-AUS edges; for AUS-strict use AWS Bedrock Sydney)

How to use from Australia

Cloudflare account (free at cloudflare.com)
Enable Workers AI in dashboard (often default)

Call via Workers (server-side) or REST API:

// Inside a Cloudflare Worker
export default {
  async fetch(request, env) {
    const response = await env.AI.run('@cf/meta/llama-4-70b', {
      messages: [{ role: 'user', content: 'Hello' }]
    });
    return new Response(JSON.stringify(response));
  }
};

AUS Cloudflare edge locations (Sydney, Melbourne, Brisbane, Perth, Adelaide) handle routing automatically

What it costs

Free tier

10,000 neurons / day (Cloudflare’s compute unit for AI)
Sufficient for development and small projects

Workers Paid plan — US$5/month

10M requests / month included for Workers
30 million additional neurons / month for AI
Plus pay-per-additional usage

Per-model pay-per-token

Varies by model
Generally cheaper than direct OpenAI / Anthropic for comparable open-weight tasks
Llama 4 70B: ~US$0.30-0.40 per million tokens (verify current)

How it compares to alternatives

Aspect	Cloudflare Workers AI	AWS Bedrock	Vertex AI	Together AI
Edge / global inference	Best (300+ locations)	Per-region	Per-region	Per-region
Free tier	Most generous	Limited	Limited	Sign-up credit
Cloudflare Workers integration	Native	Manual	Manual	Manual
Frontier closed models	None	Claude	Gemini	None
Open-weight catalog	Curated (~50)	Curated	Broad (Model Garden)	Broadest
AUS data residency	Global edge incl AUS	Yes (Sydney)	Yes (Sydney+Melbourne)	Limited
Best for	Cloudflare-stack + global edge + cheap	AWS shops + AUS residency	GCP shops + AUS residency	Open-weight production

For developers in the Cloudflare ecosystem building global apps, Workers AI is the natural choice.

Privacy / data handling

No training on customer data — committed
Data routed through Cloudflare’s network; can be processed at any nearest edge location
Cloudflare has a strong privacy reputation overall
For strict AUS-only data residency, AWS Bedrock Sydney is the stronger choice
Cloudflare’s AI Gateway (separate product) can route to multiple AI providers with central observability

Recent changes

2026: Llama 5 family + expanded multimodal models
2025: AI Gateway matured (companion product)
2024: Model catalog expanded; Vectorize integration
September 2023: Workers AI launched

Gotchas

Neuron pricing model is unique to Cloudflare — different from per-token pricing elsewhere; modelling cost requires understanding their unit
Edge global = not single-region — for strict AUS data residency, use Bedrock Sydney instead
Model catalog smaller than AWS / Together — for niche models, check availability
Best paired with full Cloudflare developer platform (Workers + R2 + D1 + Vectorize + Pages) for tight integration
For high-volume production at frontier-model quality, Anthropic / OpenAI direct often still preferred

Tech & AI, Explained

Explorer

cloudflare-workers-ai