A fully managed platform in Microsoft Foundry for hosting, scaling, and securing AI agents built with any supported framework or model
Hello @nishant garg ,
Welcome to Microsoft Q&A .Thank you for reaching out to us.
The behavior observed aligns with the current architecture of Azure AI Foundry Agent Service, where conversation state management and retrieval metadata are handled through separate but complementary components rather than a unified configuration.
For conversation threading and chat history, the new Foundry experience provides built-in support through structured runtime components:
- Conversations act as the primary mechanism for maintaining multi-turn state. Reusing the same conversation ID preserves history across requests.
- Memory (preview) enables longer-term continuity by storing relevant context across sessions through configurable memory stores.
In scenarios requiring higher control or durability, external storage can also be considered:
- Persist conversation history in a database (Cosmos DB / SQL / Redis)
- Maintain identifiers such as conversation_id / session_id
- Rehydrate context into each request when invoking the agent
This ensures predictable behavior across environments and avoids reliance on preview features if stability is a primary concern.
For retrieval and source attribution (chunk IDs, relevance scores), the agent abstraction does not expose detailed retrieval outputs. These are produced by Azure AI Search and must be accessed directly.
Recommended approach is to
- Use Azure AI Search for retrieval operations
- Configure the index with retrievable fields (e.g., chunk_id, content, metadata)
- Perform vector or hybrid queries to return relevant chunks along with ranking signals
- Pass selected chunks into the agent prompt for grounding
- Display metadata (IDs, scores, citations) at the application layer
To meet both requirements together, the following practical architecture pattern is recommended:
- Conversation State (Threading) -Using Foundry Conversations for multi-turn interactions.Optionally enable Memory stores for cross-session continuity.
- Retrieval Layer (Attribution) -Using Azure AI Search for document retrieval and metadata. Then returning chunk details as part of the search response
- For application orchestration
- Retrieve context from Azure AI Search
- Fetch conversation history (from conversations or external store)
- Construct prompt (history + retrieved content)
- Invoke Foundry agent for response generation
- Store new messages and associated metadata
Regarding hybrid usage:
- Combining old Foundry for memory and new Foundry for retrieval is not recommended
- Both experiences operate on separate architectures and do not share state
- A unified design within the new Foundry ecosystem provides better scalability and maintainability
In summary-
- conversation continuity is achieved through Foundry conversations and optional memory stores, while retrieval metadata is obtained from Azure AI Search.
- Combining these layers explicitly at the application level provides a complete and production-ready solution for RAG scenarios requiring both threading and source attribution.
The following references might be helpful , please check them out
- What's new in Foundry Agent Service? (classic) - Microsoft Foundry (classic) portal | Microsoft Learn
- Connect an Azure AI Search index to Foundry agents - Microsoft Foundry | Microsoft Learn
- Tutorial: Build an Agentic Retrieval Solution - Azure AI Search | Microsoft Learn
- Build with agents, conversations, and responses in Foundry Agent Service - Microsoft Foundry | Microsoft Learn
- Create and Use Memory - Microsoft Foundry | Microsoft Learn
Thank you