AI Search
Create AI-powered search for your data
Connect your data and deliver natural language search in your applications.
Deploy RAG architecture in minutes
Always Up-to-Date
Built to Run at the Edge
Core Capabilities

Continuously Updated Indexes
Your data, always fresh. AI Search automatically tracks and updates content changes without manual intervention — keeping LLM responses aligned with your latest data.

Build full AI applications with AI Search and Workers
Search from your Worker. Easily make search calls to your AI Search directly in your Workers apps using standard JavaScript or TypeScript.

Metadata Filtering
Powerful multi-tenant search. Use metadata filters to build user-specific search contexts, enabling secure multi-user experiences with a single AI Search instance.

Web Parsing Source
Bring your website as a source. Generate RAG pipelines directly from your website — whenever content is updated

Edge-Based Inference
Fast, local AI responses. Uses Workers AI to create embeddings and run inference at the edge, closer to users — reducing latency and improving responsiveness.
NLWeb and AI Search
Product Chatbot
Multi-Tenant or Personalized AI Assistants
How it Works

Receive query from AI Search API
The query workflow begins when you send a request to either the AI Search's AI Search or Search endpoint.

Query rewriting (optional)
AI Search provides the option to rewrite the input query using one of Workers AI's LLMs to improve retrieval quality by transforming the original query into a more effective search query.

Embedding the query
The rewritten (or original) query is transformed into a vector via the same embedding model used to embed your data so that it can be compared against your vectorized data to find the most relevant matches.

Vector search in Vectorize
The query vector is searched against stored vectors in the associated Vectorize database for your AI Search.

Metadata + content retrieval
Vectorize returns the most relevant chunks and their metadata. And the original content is retrieved from the R2 bucket. These are passed to a text-generation model.

Response generation
A text-generation model from Workers AI is used to generate a response using the retrieved content and the original user's query.
RAG in action
Examples showing how to query, filter, and integrate AI Search into your applications.

export default { async fetch(request, env) { const { searchParams } = new URL(request.url); const query = searchParams.get('q'); if (!query) { return new Response('Please provide a query parameter', { status: 400 }); } // Search your AI Search instance const answer = await env.AI.autorag("my-rag").aiSearch({ query: query, }); return new Response(JSON.stringify(answer), { headers: { 'Content-Type': 'application/json' } }); } };
Zendesk
“
Like Zendesk, innovation is in Cloudflare’s DNA — it mirrors our beautifully simple development ethos with the connectivity cloud, a powerful, yet simple-to-implement, end-to-end solution that does all the heavy lifting, so we don’t need to. ”
Powerful primitives, seamlessly integrated
Built on systems powering 20% of the Internet, AI Search runs on the same infrastructure Cloudflare uses to build Cloudflare. Enterprise-grade reliability, security, and performance are standard.
Compute
Storage
AI
Media
Network
Build without boundaries
Join thousands of developers who've eliminated infrastructure complexity and deployed globally with Cloudflare. Start building for free — no credit card required.