Microsoft has launched the public preview of agentic retrieval in Azure AI Search, a query engine that autonomously plans and executes retrieval strategies for complex questions. According to the company, it enhances answer relevance in conversational AI by up to 40% compared to traditional RAG. This multi-turn system leverages conversation history and Azure OpenAI to break down queries into focused subqueries, executed in parallel across text and vector embeddings.
This new capability is supported programmatically through a new Knowledge Agents object in the 2025-05-01-preview data plane REST API and Azure SDK prerelease packages. It builds on Azure AI Search’s existing index, a dedicated “Agent” resource that links to Azure OpenAI, and the retrieval engine orchestrating the process. Microsoft positions agentic retrieval as a crucial step toward building more sophisticated knowledge retrieval systems, explicitly designed for intelligent agents, and provides high-quality grounding data for downstream consumption.
According to the documentation, the agentic retrieval process involves the following stages: First, an LLM analyzes the entire chat thread to identify the core information. Subsequently, it plans a retrieval strategy that incorporates the chat history and the original query. Next, each subquery runs simultaneously, leveraging both keyword and semantic search capabilities of Azure AI Search. In a Microsoft Build session, Matthew Gotteiner explained:
It’s important to note that the overall speed of agentic retrieval is directly related to the number of subqueries generated. While running subqueries in parallel aims to accelerate the process, a more complex query requiring numerous subqueries will naturally take longer to complete. Counterintuitively, a “mini” query planner that generates fewer, broader subqueries might return results faster than a “full-size” planner designed to create a larger number of highly focused subqueries.
The results are reranked using the platform’s semantic ranker into a unified grounding payload with top hits and structured metadata. And finally, the API also returns a detailed activity log of the retrieval process.
(Source: Microsoft Tech community blog post)
Akshay Kokane, a Software Engineer at Microsoft, concluded in a Medium blog post:
Traditional RAG systems are a great starting point for enhancing LLMs with domain-specific knowledge — especially when using tools like Semantic Kernel and Azure AI Search, which simplify embedding and retrieval. However, as enterprise use cases become more complex, the limitations of static, linear workflows become apparent.
Agentic RAG (ARAG) addresses this gap by introducing dynamic reasoning, intelligent tool selection, and iterative refinement. Agents can adapt their search strategies, evaluate results, and construct more precise, context-aware answers — making them ideal for evolving business needs, compliance workflows, or multi-source data environments.
Lastly, the public preview is currently available in select regions, and the agentic retrieval pricing includes per-token billing for Azure OpenAI’s query planning and Azure AI Search’s semantic ranking, both of which are free during the initial preview. Documentation, a cookbook, and integration guidance with Azure AI Agent Service are available for developers.