🇺🇸 USA · Cloudflare Workers AI
Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Run AI inference at the edge — anywhere Cloudflare’s global network reaches. Llama / Mistral / Whisper served from data centres within milliseconds of your users worldwide. Tightly integrated with the Cloudflare developer platform.
Front-matter facts
| Field | Value |
|---|---|
| Vendor | Cloudflare Inc (San Francisco, USA) |
| Country / origin | 🇺🇸 USA |
| Recommended for Australian users? | ✅ Yes — Cloudflare has multiple AUS edge locations; very low-latency for AUS users |
| Privacy summary | No training on customer data; data routed through Cloudflare global network; per-model privacy posture |
| Free tier | Yes — generous free tier per day |
| Paid tiers | Pay-per-request beyond free tier; bundled with Workers Paid plan US$5/month |
| First released | September 2023 |
| Last reviewed | 2026-06-26 |
| Official site | https://developers.cloudflare.com/workers-ai |
What it is
Cloudflare Workers AI is edge-deployed AI inference on Cloudflare’s global network. Where AWS Bedrock / Azure OpenAI / Vertex AI run AI in specific cloud regions (Sydney, US-East, etc.), Workers AI runs inference at Cloudflare’s 300+ edge locations worldwide — including multiple AUS locations.
Why edge inference matters:
- Lowest possible latency — inference happens at the data centre nearest to the user
- Tight integration with Cloudflare Workers (serverless functions at edge), R2 (object storage), D1 (SQLite-at-edge), Vectorize (vector DB)
- No region selection needed — Cloudflare auto-routes
- Free / cheap for development
Models supported (curated, not full Bedrock-scale):
- Llama (4 / 5 family)
- Mistral (various)
- Whisper (speech-to-text)
- Stable Diffusion (image gen)
- BGE / various embedding models
- Plus 50+ others (browse at developers.cloudflare.com/workers-ai/models)
What you’d use it for
- Build apps with Cloudflare Workers + AI — natural integration
- Edge-deployed AI features — chat, transcription, image gen running globally
- AI features that need lowest latency — voice agents, real-time interactions
- Cost-effective AI for global apps — generous free tier
- Privacy-friendly — Cloudflare is a privacy-positive vendor
- Replace OpenAI / Anthropic for some workloads if open-weight quality is sufficient
When NOT to use Workers AI:
- For frontier-closed models (Claude Opus, GPT-5, Gemini Pro) — use those vendors
- For broadest open-weight catalog (Together / Fireworks have more)
- For AUS-only data residency (Workers AI is global; data may route via non-AUS edges; for AUS-strict use AWS Bedrock Sydney)
How to use from Australia
- Cloudflare account (free at cloudflare.com)
- Enable Workers AI in dashboard (often default)
- Call via Workers (server-side) or REST API:
// Inside a Cloudflare Worker export default { async fetch(request, env) { const response = await env.AI.run('@cf/meta/llama-4-70b', { messages: [{ role: 'user', content: 'Hello' }] }); return new Response(JSON.stringify(response)); } }; - AUS Cloudflare edge locations (Sydney, Melbourne, Brisbane, Perth, Adelaide) handle routing automatically
What it costs
Free tier
- 10,000 neurons / day (Cloudflare’s compute unit for AI)
- Sufficient for development and small projects
Workers Paid plan — US$5/month
- 10M requests / month included for Workers
- 30 million additional neurons / month for AI
- Plus pay-per-additional usage
Per-model pay-per-token
- Varies by model
- Generally cheaper than direct OpenAI / Anthropic for comparable open-weight tasks
- Llama 4 70B: ~US$0.30-0.40 per million tokens (verify current)
How it compares to alternatives
| Aspect | Cloudflare Workers AI | AWS Bedrock | Vertex AI | Together AI |
|---|---|---|---|---|
| Edge / global inference | Best (300+ locations) | Per-region | Per-region | Per-region |
| Free tier | Most generous | Limited | Limited | Sign-up credit |
| Cloudflare Workers integration | Native | Manual | Manual | Manual |
| Frontier closed models | None | Claude | Gemini | None |
| Open-weight catalog | Curated (~50) | Curated | Broad (Model Garden) | Broadest |
| AUS data residency | Global edge incl AUS | Yes (Sydney) | Yes (Sydney+Melbourne) | Limited |
| Best for | Cloudflare-stack + global edge + cheap | AWS shops + AUS residency | GCP shops + AUS residency | Open-weight production |
For developers in the Cloudflare ecosystem building global apps, Workers AI is the natural choice.
Privacy / data handling
- No training on customer data — committed
- Data routed through Cloudflare’s network; can be processed at any nearest edge location
- Cloudflare has a strong privacy reputation overall
- For strict AUS-only data residency, AWS Bedrock Sydney is the stronger choice
- Cloudflare’s AI Gateway (separate product) can route to multiple AI providers with central observability
Recent changes
- 2026: Llama 5 family + expanded multimodal models
- 2025: AI Gateway matured (companion product)
- 2024: Model catalog expanded; Vectorize integration
- September 2023: Workers AI launched
Gotchas
- Neuron pricing model is unique to Cloudflare — different from per-token pricing elsewhere; modelling cost requires understanding their unit
- Edge global = not single-region — for strict AUS data residency, use Bedrock Sydney instead
- Model catalog smaller than AWS / Together — for niche models, check availability
- Best paired with full Cloudflare developer platform (Workers + R2 + D1 + Vectorize + Pages) for tight integration
- For high-volume production at frontier-model quality, Anthropic / OpenAI direct often still preferred
See also
- Cloudflare 🟩 🟦 — the broader platform
- Cloudflare AI Gateway 🟥 — companion product for routing to multiple AI providers
- Vercel 🟩 🟦 — alternative serverless platform
- Vercel AI SDK 🟥
- Vercel AI Gateway 🟥
- AWS Bedrock 🟩 🟦
- Together AI 🟩 🟦
- Groq 🟩 🟦
- Llama 🟩 🟦
- What is the cloud? 🟩 🟦