Agentic Retrieval in Azure AI Search
Agentic retrieval is currently in public preview. Features and capabilities may change.
What is Agentic Retrieval?
Agentic retrieval transforms how AI agents interact with your data by:- Query decomposition: Using an LLM to break down complex queries into focused subqueries
- Parallel execution: Running multiple subqueries simultaneously for better coverage
- Semantic reranking: Promoting the most relevant matches from each subquery
- Unified response: Combining results into a modular, comprehensive answer
Key Capabilities
LLM Query Planning
Automatically breaks complex questions into targeted subqueries using chat history and context
Parallel Retrieval
Executes multiple searches simultaneously across indexed and remote knowledge sources
Semantic Reranking
Applies machine learning to surface the most relevant results for each subquery
Agent-Optimized Response
Returns structured output designed for agent consumption with grounding data and references
How It Works
Retrieval Process
Query Planning
Knowledge base sends query and history to an LLM, which analyzes context and breaks down the question into focused subqueries
Query Execution
Knowledge base sends subqueries to knowledge sources. All execute simultaneously using keyword, vector, or hybrid search
Semantic Reranking
Each subquery’s results undergo semantic reranking to identify the most relevant matches
Why Use Agentic Retrieval?
Complex Query Handling
Traditional search struggles with queries like:- “Find me a hotel near the beach, with airport transportation, and that’s within walking distance of vegetarian restaurants”
- Hotels near beaches
- Hotels with airport shuttle service
- Hotels near vegetarian dining options
Query Expansion
Benefits:- Corrects spelling mistakes automatically
- Adds synonyms and paraphrasing
- Includes chat history context
- Handles compound questions
Multi-Source Retrieval
Query across different knowledge sources simultaneously:- Indexed search indexes
- Remote SharePoint sites
- Public web data (Bing)
- Microsoft OneLake
- Azure Blob Storage
Architecture Components
Knowledge Base
Orchestrates the entire retrieval pipeline:Knowledge Sources
Represent searchable content:- Indexed sources: Search indexes on Azure AI Search
- Remote sources: Live data from SharePoint, Bing, or web APIs
Required Components
| Component | Service | Purpose |
|---|---|---|
| LLM | Azure OpenAI | Query planning and context analysis |
| Knowledge Base | Azure AI Search | Orchestration and parameter management |
| Knowledge Source | Azure AI Search | Wrapper for search indexes or remote data |
| Search Index | Azure AI Search | Stores searchable text and vectors |
| Semantic Ranker | Azure AI Search | L2 reranking for relevance |
Response Structure
Agentic retrieval returns a three-part response:1. Merged Content
Grounding data for LLM answer generation:2. Source References
Original documents for citation:3. Activity Log
Execution details for debugging:Retrieval Reasoning Effort
Control LLM usage with reasoning effort levels:- Minimal
- Low
- Medium
Minimal Effort
- No LLM query planning
- Direct keyword and vector search
- All knowledge sources queried
- Fastest execution
- Lowest cost
- Query is already well-formed
- Speed is critical
- Cost optimization is important
Example: Complex Query Decomposition
User Query
Query Plan
The LLM generates subqueries:-
Subquery 1: “pet-friendly hotels San Diego”
- Targets: Hotels allowing dogs
- Filter: PetsAllowed eq true
-
Subquery 2: “hotels with swimming pool San Diego”
- Targets: Pool amenities
- Filter: PoolAvailable eq true
-
Subquery 3: “quiet hotels San Diego NOT Marina Inn”
- Targets: Peaceful locations
- Excludes: Previous stay
- Context: User preference for quiet
Execution
Multi-Source Example
Knowledge Base Configuration
Query Execution
User Query: “What is our vacation policy and industry best practices?” Routing Logic:sharepoint-policies(always queried): Company policiesinternal-docs(conditionally): HR documentationweb-resources(conditionally): Industry standards
Integration with Foundry Agent Service
Connect agentic retrieval to Microsoft Foundry agents:Performance Considerations
Latency Factors
Agentic retrieval adds latency due to:- LLM query planning (1-3 seconds)
- Parallel subquery execution
- Semantic reranking
Optimization Strategies
Use Faster Models
Use Faster Models
gpt-4o-minifor query planning- Reduces planning latency by 50-70%
- Sufficient for most query decomposition tasks
Minimize LLM Processing
Minimize LLM Processing
- Set
retrievalReasoningEfforttominimalwhen possible - Exclude LLM processing for simple queries
- Use direct search for known patterns
Optimize Knowledge Sources
Optimize Knowledge Sources
- Consolidate indexes to reduce fan-out
- Use
alwaysQuery: falsefor optional sources - Provide clear descriptions for source selection
Summarize Message Threads
Summarize Message Threads
- Limit chat history to recent messages
- Summarize long conversations before processing
- Reduces input token count
Cost Estimation
Billing Components
-
Azure OpenAI (query planning):
- Input tokens: Chat history + query
- Output tokens: Subqueries generated
- Model-specific pricing (e.g., gpt-4o, gpt-4o-mini)
-
Azure AI Search (agentic retrieval):
- Token-based: 1 million tokens per unit
- Free tier: 50 million tokens/month
- Pay-as-you-go after free quota
Example Cost Calculation
Scenario: 2,000 agentic retrievals per month Assumptions:- 3 subqueries per retrieval
- 50 chunks reranked per subquery
- 500 tokens per chunk
- 2,000 input tokens (chat history)
- 350 output tokens (query plan)
- Total queries: 2,000 × 3 = 6,000
- Total chunks: 6,000 × 50 = 300,000
- Total tokens: 300,000 × 500 = 150 million
- Cost: ~$3.30 USD
- Input: 2,000 × 2,000 = 4M tokens × 0.60
- Output: 2,000 × 350 = 700K tokens × 0.42
- Cost: ~$1.02 USD
Availability and Pricing
Agentic retrieval is available in selected regions during preview.Supported Regions
Check region support documentation for current availability.Pricing Plans
| Plan | Description | Monthly Quota |
|---|---|---|
| Free | Default on all tiers | 50M tokens |
| Standard | Pay-as-you-go after free quota | Unlimited |
You are not notified when transitioning from free to paid quota. Monitor usage through Azure portal metrics.
When to Use Agentic Retrieval
Agentic retrieval is ideal for:Complex Questions
Multi-part questions requiring decomposition and parallel search
Agent Workflows
Agent-to-agent communication needing structured responses
RAG Applications
Chat applications requiring high-quality grounding data
Multi-Source Scenarios
Querying across indexed and remote data sources
Next Steps
Create Knowledge Base
Set up your first knowledge base
Knowledge Sources
Learn about different knowledge source types
Create Index
Build an index for agentic retrieval
Query API
Explore the retrieval API reference