The AI Inference platform
Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Serverless pricing
Rich model catalog
Widely compatible
AI models easily accessible via code, OpenAI SDK or API
Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.
Llama 4 Scout
Balanced generalist for everyday tasks
deepseek-r1-qwen-distill
Reasoning-first model for logic and math
GPT-OSS 120B
Open-weight powerhouse for enterprise-scale chat
Qwen 3 Coder
Specialized for coding and debugging
Scale up, and down
Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.
Run any AI model with one API call
Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.

const response = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages: [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World" }, ]} );
Practical AI at the Edge
Run real-world AI workloads directly on Cloudflare’s global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.
Image generation
Speech-to-text, in real-time
Embeddings
LLMs
Workers AI Pricing
50+ models running at the edge. View AI pricing details
Neurons
—
$0.011 / thousand neurons
Shopify
“
For Shopify, the real challenge is not about how many different pieces of complex technology we can use but the opposite. Cloudflare helps us find a simple way to achieve something very complex that we can scale and maintain. ”
Powerful primitives, seamlessly integrated
Built on systems powering 20% of the Internet, Workers AI runs on the same infrastructure Cloudflare uses to build Cloudflare. Enterprise-grade reliability, security, and performance are standard.
Compute
Storage
AI
Media
Network
Build without boundaries
Join thousands of developers who've eliminated infrastructure complexity and deployed globally with Cloudflare. Start building for free — no credit card required.