The AI Inference platform

Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Serverless pricing

Pay-per-inference pricing with no idle costs. No guessing what.

Rich model catalog

50+ models running close to users in 200+ cities

Widely compatible

One API call, works with any OpenAI SDK or task type

AI models easily accessible via code, OpenAI SDK or API

Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.

Meta
Llama 4 Scout

Balanced generalist for everyday tasks

deepseek-r1-qwen-distill

Reasoning-first model for logic and math

GPT-OSS 120B

Open-weight powerhouse for enterprise-scale chat

Qwen 3 Coder

Specialized for coding and debugging

Scale up, and down

Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.

Scale up, and down

Run any AI model with one API call

Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.

Background Pattern
const response = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages: [
    { role: "system", content: "You are a friendly assistant" },
    { role: "user", content: "What is the origin of the phrase Hello, World" },
  ]}
);

Practical AI at the Edge

Run real-world AI workloads directly on Cloudflare’s global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.

Background Pattern
Workers AI

Explore a Rich Catalog of 50+ Ready-to-Use Models

Real-world examples in action

See more
Image generation

Image generation

Execute image generation, manipulation, and creative workflows without managing GPU infrastructure. Perfect for content platforms, social apps, and creative tools. Learn more
Speech-to-text, in real-time

Speech-to-text, in real-time

Transcribe, analyze, and generate audio content without specialized infrastructure. Built for voice agents, note-taking apps, and media processing. Learn more
Embeddings

Embeddings

Create intelligent search, recommendations, and context-aware features using vector embeddings. Seamlessly integrates with Vectorize AI Search for complete AI workflows. Learn more
LLMs
Meta

LLMs

Perform a wide range of natural language tasks. Use large language models for text generation, classification, question answering, and other complex language-based operations through a simple API. Learn more

Workers AI Pricing

50+ models running at the edge. View AI pricing details

Neurons

Free

Paid

$0.011 / thousand neurons

Shopify

For Shopify, the real challenge is not about how many different pieces of complex technology we can use but the opposite. Cloudflare helps us find a simple way to achieve something very complex that we can scale and maintain. ”

Duncan Davidson
Duncan Davidson VP of Developer Productivity

Powerful primitives, seamlessly integrated

Built on systems powering 20% of the Internet, Workers AI runs on the same infrastructure Cloudflare uses to build Cloudflare. Enterprise-grade reliability, security, and performance are standard.

Build without boundaries

Join thousands of developers who've eliminated infrastructure complexity and deployed globally with Cloudflare. Start building for free — no credit card required.