What are the key points?

AI coding agents often lose reasoning quality during long sessions due to context window saturation. Developers frequently misattribute performance drift to MCP tool overhead instead of accumulated conversation history. The author recommends measuring token usage directly before disabling tools to identify the specific bottleneck.

Measuring Token Usage Before Disabling AI Agent Tools

•AI coding agents often lose reasoning quality during long sessions due to context window saturation.
•Developers frequently misattribute performance drift to MCP tool overhead instead of accumulated conversation history.
•The author recommends measuring token usage directly before disabling tools to identify the specific bottleneck.

AI coding agents often exhibit performance degradation during long sessions, characterized by forgotten constraints, repetitive responses, and increased vagueness. This behavior, often perceived as the model becoming 'dumber,' typically occurs without any technical errors or system crashes. Many developers initially attribute this drift to overloaded context windows caused by external tools, such as the Model Context Protocol (MCP - standard for connecting AI models to external tools and data). However, measuring the actual token distribution often reveals that conversation history is the primary contributor to context exhaustion, rather than the overhead from connected tools.

A breakdown of context window usage generally shows that conversation history occupies the largest proportion of available space, often reaching around a fifth of the total window in long sessions. Fixed startup overheads, including system prompts and memory files, remain stable, while connected MCP tool definitions frequently represent a smaller slice than anticipated. The impact of MCP tools depends heavily on the specific client implementation; some clients defer loading tool schemas until they are necessary, minimizing their idle impact on the context window. Conversely, clients that front-load all schemas can cause significant initial token consumption.

To address performance drift, developers should focus on managing session length rather than blindly disconnecting tools. Best practices include starting fresh sessions for distinct tasks rather than maintaining one continuous transcript and using the agent to summarize progress when continuity is required. This approach treats the context window like a desk surface that must be cleared of unnecessary paper, rather than an infinitely expandable filing cabinet. The author emphasizes that the specific source of token consumption varies by setup, and developers should verify token allocation through a breakdown analysis before making adjustments. By measuring rather than guessing, users can identify the true cause of reasoning degradation and apply targeted solutions, such as session summarization, to maintain model performance over time.

AI coding agents often exhibit performance degradation during long sessions, characterized by forgotten constraints, repetitive responses, and increased vagueness. This behavior, often perceived as the model becoming 'dumber,' typically occurs without any technical errors or system crashes. Many developers initially attribute this drift to overloaded context windows caused by external tools, such as the Model Context Protocol (MCP - standard for connecting AI models to external tools and data). However, measuring the actual token distribution often reveals that conversation history is the primary contributor to context exhaustion, rather than the overhead from connected tools.

A breakdown of context window usage generally shows that conversation history occupies the largest proportion of available space, often reaching around a fifth of the total window in long sessions. Fixed startup overheads, including system prompts and memory files, remain stable, while connected MCP tool definitions frequently represent a smaller slice than anticipated. The impact of MCP tools depends heavily on the specific client implementation; some clients defer loading tool schemas until they are necessary, minimizing their idle impact on the context window. Conversely, clients that front-load all schemas can cause significant initial token consumption.

To address performance drift, developers should focus on managing session length rather than blindly disconnecting tools. Best practices include starting fresh sessions for distinct tasks rather than maintaining one continuous transcript and using the agent to summarize progress when continuity is required. This approach treats the context window like a desk surface that must be cleared of unnecessary paper, rather than an infinitely expandable filing cabinet. The author emphasizes that the specific source of token consumption varies by setup, and developers should verify token allocation through a breakdown analysis before making adjustments. By measuring rather than guessing, users can identify the true cause of reasoning degradation and apply targeted solutions, such as session summarization, to maintain model performance over time.