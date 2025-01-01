The AI Inference platform
Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.
Llama 4 Scout
Balanced generalist for everyday tasks
deepseek-r1-qwen-distill
Reasoning-first model for logic and math
GPT-OSS 120B
Open-weight powerhouse for enterprise-scale chat
Qwen 3 Coder
Specialized for coding and debugging
Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.
Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.
const response = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages: [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World" }, ]} );
Run real-world AI workloads directly on Cloudflare’s global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.
Workers AI Pricing
50+ models running at the edge. View AI pricing details
$0.011 / thousand neurons
For Shopify, the real challenge is not about how many different pieces of complex technology we can use but the opposite. Cloudflare helps us find a simple way to achieve something very complex that we can scale and maintain. ”
Built on systems powering 20% of the Internet, Workers AI runs on the same infrastructure Cloudflare uses to build Cloudflare. Enterprise-grade reliability, security, and performance are standard.
