Why Fewer MCP Tools (and More CLI) Makes Your LLM Smarter
Every week, I see people installing dozens of Model Context Protocol (MCP) servers into their coding assistants and then wondering why the model feels sluggish, inconsistent, or forgetful. The culprit isn’t the model — it’s context bloat.
Context Is Finite
An LLM’s context window is like scratch paper. Even if the vendor advertises 200k tokens, after system prompts and harness overhead you may only have ~176k tokens left. Every MCP tool eats into that space: its name, schema, and description all get injected into the model’s prompt. Install 50 tools? You’ve just burned tens of thousands of tokens before you’ve even started your task.
That means less room for what actually matters: your codebase, logs, or documents — the real “needle in the haystack.” The more hay you add in the form of tool descriptions, the harder it is for the model to find the needle.
The Billboard Effect
Each MCP tool acts like a billboard shouting use me! because their descriptions aren’t just stored somewhere — they are injected directly into the harness system prompt every time the model runs. That means every tool you add permanently eats into your context window, even if you never call it.
The more billboards you load, the more the model’s attention is diluted. Overlapping tools add confusion: should it call list_files, get_files, or search_files? This ambiguity reduces determinism and quality.
Why Command Line (CLI)
Here’s the trick: many CLIs don’t need MCP at all. Models have already seen Git, Docker, gh, and countless shell transcripts during training. They “speak” CLI natively. By prompting a command directly —
gh issue list --repo myorg/myrepo --label bug
— you save thousands of tokens and avoid schema overhead. In fact, skipping a GitHub MCP server alone can free up ~55,000 tokens.
When to Use MCP (and When Not To)
- Use CLI directly if the model already knows the tool.
- Use MCP only when the model can’t possibly know the interface (e.g. proprietary APIs) or when you need structured outputs.
- Don’t load every shiny MCP server you see on Reddit. More tools ≠ more power. Usually, it’s the opposite.
Less Is More
Think of context as a limited budget. Spend it on your problem, not on redundant tool descriptions. Fewer MCPs, shorter descriptions, and more CLI will give your LLM the breathing room it needs to stay sharp, reliable, and useful.
A Question to Leave You With
Context bloat in LLMs is a lot like human distraction. Imagine walking into a city square plastered with flashing billboards, each shouting for your attention — you might still find the sign you need, but your focus scatters and your recall falters. Humans deal with this by filtering, relying on long-term memory for the things we’ve practiced (like riding a bike or using git) and only leaning on instructions when we encounter something new. LLMs face the same tension: should they bake tool usage into their “muscle memory,” or should we keep teaching them on the fly through prompts? Context bloat, in this sense, is less a bug than a mirror of our own cognitive limits.
So the real question is: how do we design tools in a way that helps models focus like humans do, filtering noise and locking in the essentials?
References
- Huntley, Geoffrey. Too Many Model Context Protocol Servers and LLM Allocations on the Dance Floor. (2025). ghuntley.com/allocations
- Hong, Kelly; Troynikov, Anton; Huber, Jeff. Context Rot: How Increasing Input Tokens Impacts LLM Performance. Chroma Technical Report. (2025). research.trychroma.com/context-rot