Azure AI Foundry Teams connector — conversation history accumulates unbounded, causing token explosion

Question

Azure AI Foundry Teams connector — conversation history accumulates unbounded, causing token explosion

SriramN 20

When an Azure AI Foundry agent(from new portal) is published to Microsoft Teams using the built-in "Publish to Teams and Microsoft 365" connector, the Teams conversation history is sent with every new message. After a single agent run that processes large tool call responses, the conversation context grows to 160,000–330,000 tokens. Every subsequent message in the same Teams chat starts with this accumulated history, causing slow responses and eventual timeouts.

Note:

This issue is not seen for the same agent when used from M365 chat in web browser.
- The traces of the agent in the AI Foundry for M365 chat is shown as a new conversation id.
Starting a new Teams conversation, uninstall and reinstalling the app etc does not reset the context.
- The traces of the agent from teams, always uses the same coversation id even after days.

Question: Is there any setting in the AI Foundry agent definition, Teams channel configuration, or Bot Service to:

Limit conversation history to the last N turns
Reset context after a session timeout
Prevent tool call responses from being stored in the conversation history

This is increasing the number of tokens in each conversation as the context is big. Any help please. Thanks.

0 comments

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Hi SriramN

Thanks for the detailed question. This is a common observation when using Azure AI Foundry agents with the Teams connector.

There is currently no setting available in AI Foundry agent, Teams connector, or Bot Service to limit conversation history to last N turns reset conversation automatically after timeout prevent tool responses from being stored in history

This behavior is by design with the current connector implementation. The full conversation history is maintained and sent with each message, and the size is controlled only by the model context limits. [ai.azure.com]

why this is happening:

When your agent is used in Teams, it keeps reusing the same conversation thread. Every time you send a message, the entire history is sent again to the model. If your agent includes large tool responses or long chats, the total tokens keep increasing and eventually impact performance.

What you can do to manage this:

Since there is no built-in control, the solution is to manage conversation history from your application or agent logic.

Here are practical approaches:

Keep only last few messages, instead of sending full conversation

def trim_messages(messages, max_turns=3):
    return messages[-max_turns:]

Use this before sending request to the model

Summarize older conversation

Replace long history with short summary

def summarize_history(messages):
    summary = "User discussed earlier context. Keep latest question."
    recent = messages[-2:]
    return [{"role": "system", "content": summary}] + recent

Split into new sessions

If conversation becomes large, reset manually

messages = []
messages.append({"role": "user", "content": "Start fresh question"})

Avoid large tool outputs in history

Instead of storing full response

messages.append({
    "role": "assistant",
    "content": "Tool executed successfully. Key result captured."
})

Send only required context

Prepare minimal input before model call

final_messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    *trim_messages(messages, max_turns=3)
]

In short: Teams connector keeps full history by default and does not trim or reset it automatically. There is no config today to control this behavior. So, managing history at your side by trimming, summarizing, or resetting is the recommended approach

This will help reduce token usage, improve response time, and avoid timeout issues.

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thankyou!

Answer 2

There is no configurable setting in Azure AI Foundry agents, the Teams channel, or Bot Service (in the provided context) that allows:

Limiting conversation history to the last N turns,
Automatically resetting context after a session timeout, or
Preventing tool call responses from being stored in conversation history

for agents published to Teams via the “Publish to Teams and Microsoft 365” connector.

From the available documentation:

Conversation and token limits are enforced only by the model’s context window. It is the application’s responsibility to manage or trim history before sending it to the model. For chat completions and responses APIs, this is typically done by:
- Explicitly trimming older messages from the conversation list, or
- Using the Responses API’s built-in truncation and conversation management.
For long-running agent interactions, the recommended approach is to implement compaction strategies at the application/agent layer to keep token usage under control. The compaction framework supports strategies such as:
- SlidingWindowCompactionStrategy or TruncationCompactionStrategy to drop older turns or oldest groups when token budgets are exceeded.
- SummarizationCompactionStrategy to replace older history with a summary.
- ToolResultCompactionStrategy to reclaim space from verbose tool outputs.
The documentation emphasizes that, without such compaction or truncation, conversation history will grow until it hits model limits, increasing cost and latency and eventually causing errors.

However, the context does not describe any Teams-specific or Bot Service–level switches that change how Teams supplies history to the agent, nor any per-channel configuration to:

Cap history to the last N turns,
Define a session timeout that resets the agent’s conversation state, or
Exclude tool call results from the stored conversation.

Given the available information, the supported mitigation is to implement conversation management/compaction in the agent or application layer (for example, using the compaction strategies or Responses API) rather than relying on a Teams or Bot Service setting.

References:

Share via

Azure AI Foundry Teams connector — conversation history accumulates unbounded, causing token explosion

1 additional answer

Your answer