What are the key points?

MCP servers grant agents functional capabilities but introduce risks of destructive commands and malicious data manipulation. Developers must treat all tool output as untrusted input to prevent prompt injection attacks via server responses. Users should implement OS-level sandboxing and explicit read/write deny rules to restrict agent actions.

Security Risks of Connecting MCP Servers to AI Agents

•MCP servers grant agents functional capabilities but introduce risks of destructive commands and malicious data manipulation.
•Developers must treat all tool output as untrusted input to prevent prompt injection attacks via server responses.
•Users should implement OS-level sandboxing and explicit read/write deny rules to restrict agent actions.

Connecting an MCP server (a standard protocol for connecting AI agents to tools) grants coding agents the ability to perform external actions like database queries or API requests. This capability creates two distinct security vulnerabilities: the risk of the agent executing destructive commands and the risk of the agent being manipulated by untrusted data returned through the connection. According to the author, these two risks require separate defense strategies, as current tooling cannot address both simultaneously.

The first risk involves "prompt injection" where an MCP server returns data containing hidden instructions that a model may mistakenly execute. The author warns that users must treat all tool outputs as untrusted input, identical to user-submitted web form data. Developers should avoid passing raw tool outputs directly into command arguments, rendering them to a screen, or allowing them to influence the agent’s logic. This requires rigorous data inspection, as the connection pipe does not guarantee the safety of the content flowing through it.

The second risk is the potential for an agent to perform harmful actions, such as unauthorized file deletion or unauthorized network requests. The author emphasizes that relying on the model's "manners" is insufficient. Instead, users should implement an OS-level sandbox to isolate Bash execution. In Claude Code, this involves enabling sandbox settings using tools like Seatbelt on macOS or Bubblewrap on Linux and WSL2. Crucially, the author notes that default sandbox settings often allow read access to sensitive locations like `~/.aws/credentials` and `~/.ssh/`. Users must manually configure permission rules to deny reads for sensitive files and restrict network-based commands like `curl` or `wget` to prevent data exfiltration.

Finally, the author recommends treating MCP servers as a significant attack surface. Users should verify the source code of any server before connecting it and remove idle connections to reduce risk. Whenever possible, developers should prefer calling a CLI tool directly over standing up a persistent MCP server. When evaluating an MCP connection, users should prioritize three questions: the reliability of the source, the presence of "deny" rules for potential commands, and whether the tool's output is treated as untrusted input. Connecting a server effectively gives an agent hands, and those hands may be guided by the external entities controlling the tool.

Connecting an MCP server (a standard protocol for connecting AI agents to tools) grants coding agents the ability to perform external actions like database queries or API requests. This capability creates two distinct security vulnerabilities: the risk of the agent executing destructive commands and the risk of the agent being manipulated by untrusted data returned through the connection. According to the author, these two risks require separate defense strategies, as current tooling cannot address both simultaneously.

The first risk involves "prompt injection" where an MCP server returns data containing hidden instructions that a model may mistakenly execute. The author warns that users must treat all tool outputs as untrusted input, identical to user-submitted web form data. Developers should avoid passing raw tool outputs directly into command arguments, rendering them to a screen, or allowing them to influence the agent’s logic. This requires rigorous data inspection, as the connection pipe does not guarantee the safety of the content flowing through it.

The second risk is the potential for an agent to perform harmful actions, such as unauthorized file deletion or unauthorized network requests. The author emphasizes that relying on the model's "manners" is insufficient. Instead, users should implement an OS-level sandbox to isolate Bash execution. In Claude Code, this involves enabling sandbox settings using tools like Seatbelt on macOS or Bubblewrap on Linux and WSL2. Crucially, the author notes that default sandbox settings often allow read access to sensitive locations like `~/.aws/credentials` and `~/.ssh/`. Users must manually configure permission rules to deny reads for sensitive files and restrict network-based commands like `curl` or `wget` to prevent data exfiltration.

Finally, the author recommends treating MCP servers as a significant attack surface. Users should verify the source code of any server before connecting it and remove idle connections to reduce risk. Whenever possible, developers should prefer calling a CLI tool directly over standing up a persistent MCP server. When evaluating an MCP connection, users should prioritize three questions: the reliability of the source, the presence of "deny" rules for potential commands, and whether the tool's output is treated as untrusted input. Connecting a server effectively gives an agent hands, and those hands may be guided by the external entities controlling the tool.