Share via

Azure AI Foundry Teams connector — conversation history accumulates unbounded, causing token explosion

SriramN 20 Reputation points
2026-05-03T20:47:09.9433333+00:00

When an Azure AI Foundry agent(from new portal) is published to Microsoft Teams using the built-in "Publish to Teams and Microsoft 365" connector, the Teams conversation history is sent with every new message. After a single agent run that processes large tool call responses, the conversation context grows to 160,000–330,000 tokens. Every subsequent message in the same Teams chat starts with this accumulated history, causing slow responses and eventual timeouts.

Note:

  • This issue is not seen for the same agent when used from M365 chat in web browser.
    • The traces of the agent in the AI Foundry for M365 chat is shown as a new conversation id.
  • Starting a new Teams conversation, uninstall and reinstalling the app etc does not reset the context.
    • The traces of the agent from teams, always uses the same coversation id even after days.

Question: Is there any setting in the AI Foundry agent definition, Teams channel configuration, or Bot Service to:

  • Limit conversation history to the last N turns
  • Reset context after a session timeout
  • Prevent tool call responses from being stored in the conversation history

This is increasing the number of tokens in each conversation as the context is big. Any help please. Thanks.

Foundry Agent Service
Foundry Agent Service

A fully managed platform in Microsoft Foundry for hosting, scaling, and securing AI agents built with any supported framework or model

0 comments No comments

Answer accepted by question author

  1. Anshika Varshney 9,985 Reputation points Microsoft External Staff Moderator
    2026-05-04T09:28:49.3033333+00:00

    Hi SriramN

    Thanks for the detailed question. This is a common observation when using Azure AI Foundry agents with the Teams connector.

    There is currently no setting available in AI Foundry agent, Teams connector, or Bot Service to limit conversation history to last N turns reset conversation automatically after timeout prevent tool responses from being stored in history

    This behavior is by design with the current connector implementation. The full conversation history is maintained and sent with each message, and the size is controlled only by the model context limits. [ai.azure.com]

    why this is happening:

    When your agent is used in Teams, it keeps reusing the same conversation thread. Every time you send a message, the entire history is sent again to the model. If your agent includes large tool responses or long chats, the total tokens keep increasing and eventually impact performance.

    What you can do to manage this:

    Since there is no built-in control, the solution is to manage conversation history from your application or agent logic.

    Here are practical approaches:

    Keep only last few messages, instead of sending full conversation

    def trim_messages(messages, max_turns=3):
        return messages[-max_turns:]
    
    
    

    Use this before sending request to the model

    Summarize older conversation

    Replace long history with short summary

    def summarize_history(messages):
        summary = "User discussed earlier context. Keep latest question."
        recent = messages[-2:]
        return [{"role": "system", "content": summary}] + recent
    
    

    Split into new sessions

    If conversation becomes large, reset manually

    messages = []
    messages.append({"role": "user", "content": "Start fresh question"})
    
    

    Avoid large tool outputs in history

    Instead of storing full response

    messages.append({
        "role": "assistant",
        "content": "Tool executed successfully. Key result captured."
    })
    
    

    Send only required context

    Prepare minimal input before model call

    final_messages = [
        {"role": "system", "content": "You are a helpful assistant"},
        *trim_messages(messages, max_turns=3)
    ]
    
    

    In short: Teams connector keeps full history by default and does not trim or reset it automatically. There is no config today to control this behavior. So, managing history at your side by trimming, summarizing, or resetting is the recommended approach

    This will help reduce token usage, improve response time, and avoid timeout issues.

    I Hope this helps. Do let me know if you have any further queries.

    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thankyou!

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2026-05-03T20:47:25.5466667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    There is no configurable setting in Azure AI Foundry agents, the Teams channel, or Bot Service (in the provided context) that allows:

    • Limiting conversation history to the last N turns,
    • Automatically resetting context after a session timeout, or
    • Preventing tool call responses from being stored in conversation history

    for agents published to Teams via the “Publish to Teams and Microsoft 365” connector.

    From the available documentation:

    1. Conversation and token limits are enforced only by the model’s context window. It is the application’s responsibility to manage or trim history before sending it to the model. For chat completions and responses APIs, this is typically done by:
      • Explicitly trimming older messages from the conversation list, or
      • Using the Responses API’s built-in truncation and conversation management.
    2. For long-running agent interactions, the recommended approach is to implement compaction strategies at the application/agent layer to keep token usage under control. The compaction framework supports strategies such as:
      • SlidingWindowCompactionStrategy or TruncationCompactionStrategy to drop older turns or oldest groups when token budgets are exceeded.
      • SummarizationCompactionStrategy to replace older history with a summary.
      • ToolResultCompactionStrategy to reclaim space from verbose tool outputs.
    3. The documentation emphasizes that, without such compaction or truncation, conversation history will grow until it hits model limits, increasing cost and latency and eventually causing errors.

    However, the context does not describe any Teams-specific or Bot Service–level switches that change how Teams supplies history to the agent, nor any per-channel configuration to:

    • Cap history to the last N turns,
    • Define a session timeout that resets the agent’s conversation state, or
    • Exclude tool call results from the stored conversation.

    Given the available information, the supported mitigation is to implement conversation management/compaction in the agent or application layer (for example, using the compaction strategies or Responses API) rather than relying on a Teams or Bot Service setting.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.