What are the key points?

MCP servers often suffer from context bloating, consuming up to 400K tokens per tool load. Cloudflare's Code Mode MCP reduces input token usage by 99.9% by using execution-based tools. ZenStack adopted a schema-and-execute approach, enabling complex database queries with improved token efficiency.

Mitigating MCP Context Bloating with Code Mode

•MCP servers often suffer from context bloating, consuming up to 400K tokens per tool load.
•Cloudflare's Code Mode MCP reduces input token usage by 99.9% by using execution-based tools.
•ZenStack adopted a schema-and-execute approach, enabling complex database queries with improved token efficiency.

The Model Context Protocol (MCP), a standard for connecting AI models to external data, faces criticism regarding "context bloating" where large tool schemas exhaust the model's available memory. Users report that loading complex MCP server tools can consume up to 400K tokens, rendering them infeasible for many applications. This issue is particularly pronounced when exposing Object-Relational Mapping (ORM) query APIs that allow deeply nested database relations. To mitigate these inefficiencies, developers are increasingly turning to Agent Skills, which prioritize CLI-based workflows that are more token-efficient than traditional MCP tool schemas.

Cloudflare recently introduced "Code Mode" for MCP, which simplifies API interactions by exposing only two tools: 'search' and 'execute'. This approach allows the LLM to write code directly, keeping the footprint fixed regardless of the number of API endpoints. By implementing this method, Cloudflare reduced input token usage by 99.9%, bringing total consumption to approximately 1k tokens per request. Inspired by this, ZenStack developed a similar architecture for its database-focused MCP server to address its own scaling limitations.

ZenStack's implementation introduces three specialized tools: 'schema', 'execute', and 'check'. The 'schema' tool transmits the entire application schema to the LLM for context, while 'execute' handles function calls like 'findMany' or 'createMany'. The 'check' tool validates query parameters before execution to ensure accuracy and reduce runtime errors. During testing with a complex application containing over 50 models, the system successfully managed queries involving more than 10 nested models within Claude desktop using Sonnet 4.6. This method ensures that query operations remain valid and performant, preventing the context window from becoming overwhelmed by redundant schema definitions.

The Model Context Protocol (MCP), a standard for connecting AI models to external data, faces criticism regarding "context bloating" where large tool schemas exhaust the model's available memory. Users report that loading complex MCP server tools can consume up to 400K tokens, rendering them infeasible for many applications. This issue is particularly pronounced when exposing Object-Relational Mapping (ORM) query APIs that allow deeply nested database relations. To mitigate these inefficiencies, developers are increasingly turning to Agent Skills, which prioritize CLI-based workflows that are more token-efficient than traditional MCP tool schemas.

Cloudflare recently introduced "Code Mode" for MCP, which simplifies API interactions by exposing only two tools: 'search' and 'execute'. This approach allows the LLM to write code directly, keeping the footprint fixed regardless of the number of API endpoints. By implementing this method, Cloudflare reduced input token usage by 99.9%, bringing total consumption to approximately 1k tokens per request. Inspired by this, ZenStack developed a similar architecture for its database-focused MCP server to address its own scaling limitations.

ZenStack's implementation introduces three specialized tools: 'schema', 'execute', and 'check'. The 'schema' tool transmits the entire application schema to the LLM for context, while 'execute' handles function calls like 'findMany' or 'createMany'. The 'check' tool validates query parameters before execution to ensure accuracy and reduce runtime errors. During testing with a complex application containing over 50 models, the system successfully managed queries involving more than 10 nested models within Claude desktop using Sonnet 4.6. This method ensures that query operations remain valid and performant, preventing the context window from becoming overwhelmed by redundant schema definitions.