Your AI Agent Is The Attacker – Claude, OpenCode – Threats and Security Designs

AI THREATS IN THE WILD

In February 2026, the Cisco CX AI Tools team publicly flagged the popular OpenCode add-on oh-my-opencode after finding remote AI prompt injection in its installation guide.

Their concern was not a theoretical bug but instead it was that an AI agent following the official instructions within source, could be manipulated into taking actions the user never requested, including starring repositories and inserting branding text into outputs.

In September 2025, the npm ecosystem was hit by a major supply-chain compromise dubbed (Shai-Hulud) that CISA said spread through compromised developer accounts and malicious code injection into trusted packages. The significance was not just that malware existed in npm beause that has happened before but that the attack moved quickly through the trust relationships developers rely on every day.

For AI coding tools that install packages, plugins, or helper components from npm, that is exactly the kind of ecosystem-level failure that turned ordinary installs into security incidents.

Then in February 2026, researchers at Tenable documented a malicious npm package called ambar-src that was downloaded about 50,000 times before removal. The package used evasion techniques and delivered malware during installation, targeting developer machines across Windows, Linux, and macOS.

That story matters because it shows how little time a malicious package may need to do damage once it lands in a trusted registry and developer workflow.

On the MCP side, the risk stopped being hypothetical in April 2025, when Invariant Labs disclosed tool poisoning attacks: malicious instructions hidden in MCP tool descriptions that are visible to the model but effectively invisible to the user.

The AI agent can be nudged into unauthorized behavior, including exfiltrating sensitive data, even when the user sees only an innocent-looking tool name.

Emerging research suggests that highly autonomous AI agents do not always treat permissions as fixed boundaries; they can treat them as problems to solve. The March 2026 OpenDev paper emphasizes the difficulty of keeping terminal-native agents controllable, while research teams like NONO and Ona Veto are now building security products around the idea that agents may modify permissions or bypass weaker controls when given enough agency.

Reference Security Architecture (Theory)

All of this has inspired me to start working on a prototype that assists in enabling developers to build more secure AI Agent within containers that can assist in mitigating some, but not all, of these concerns. Additionally, I’m exploring overlaying this secure build pipeline with One Veto and NONO kernel level controls.

AI coding agents such as OpenCode and Claude Code can and often do operate with the same kinds of access a developer does on local corporate workstation. This needs to stop, and frankly does not support the long-term goals to adopt more AI Agents and automation.

When used improperly, these AI Agents can read files, run shell commands, call APIs and act on instructions pulled from project files, plugins, MCP servers, or prompts.

That makes them powerful, but it also means a single malicious dependency, poisoned tool description, or injected instruction can quickly expand into credential theft, unwanted code execution or broader host compromise.

My reference security architecture is built to reduce that blast radius through layered controls.

Instead of trusting the agent, the design assumes something will eventually fail and places safeguards at multiple points in the stack.

Secrets are fetched on the host and only the specific credentials needed for the session are securely injected into the container, so User Admin Vault tokens and or Admin cloud credentials never enter AI Agent access boundry.
Network egress is either forced through an allowlisted proxy or disabled entirely.
Containers run as non-root, with privileges dropped, resource limits enforced, and optional AppArmor restrictions applied.
At the top layer, stricter AI agent settings reduce risky file modification, shell execution, and tool use before lower-level controls even have to intervene.

Simple architecture view

┌───────────────────────────────┐
│ Host secrets fetch            │
│ Vault / AWS / env vars        │
│ Only needed keys selected     │
└──────────────┬────────────────┘
               │inject secrets
               ▼
┌───────────────────────────────┐
│ Agent container               │
│ OpenCode / Claude Code        │
│ Non-root, limited privileges  │
└──────────────┬────────────────┘
               │
     ┌─────────┴─────────┐
     ▼                   ▼
┌───────────────┐   ┌────────────────┐
│ Proxy mode    │   │ Isolated mode  │
│ Allowlisted   │   │ No outbound    │
│ outbound only │   │ network        │
└───────────────┘   └────────────────┘

Why this matters in practice

Without these controls, a compromised plugin or prompt-injected tool might read local secrets, call out to an attacker-controlled endpoint, or modify sensitive project files. In this model, that same event is forced through multiple checkpoints:

the agent does not receive local human creds, session state, kerb tickets, LSAAS access or full Vault or cloud credentials
outbound traffic can only reach approved destinations or none at all
the container cannot run as root or easily escalate privileges (but kernel level enforcement may still be needed)
stricter agent policy reduces what the model can do on its own

Example scenarios

Example 1: Malicious MCP server or plugin
A loaded tool attempts to exfiltrate data to an unknown external domain. In this design, the request is blocked by the outbound allowlist or prevented entirely in isolated mode. If malicious injection can reason and use existing tool or skill, then they must live out of the land are are contained via the network egress allowed list.

Example 2: Prompt injection in a project file
A repository contains instructions telling the agent to read secrets or modify files outside the task scope. In isolated filesystem mode, the host workspace is not mounted, so the agent cannot directly reach the developer’s real files. If prompt injection does occur successfully it should be limited to the git clone, files or artifacts pulled within the container. Additional access control layers such as reduced claims and scopes can assist in more destructive remote MCP or agent calls.

Example 3: Compromised dependency or shell command
If a dependency or generated shell command behaves maliciously inside the container, it still runs as a non-root user with dropped capabilities, limited CPU and memory and no privilege escalation. The agent will be limited to the local user permissions and left to LOTL based on the tool and skill allow list.

This is a high-level defense-in-depth sandbox for agentic coding tools. It does not assume the model, plugin, MCP server, or dependency chain will always behave safely.

Instead, it limits what any one failure can reach. The result is a more practical and enterprise-ready way to run OpenCode or Claude Code for trusted development, untrusted repository analysis, and higher-assurance AI-assisted engineering workflows.

I’ll be publishing a more detailed step by step shortly.

CLAUDECODE — RUNTIME & STACK VULNERABILITIES

March 2026

As of March 2026, Claude Code’s most significant technical risk lies in the combination of its Node.js runtime, built-in HTTP stack and MCP extension model.

The most serious runtime concerns are recent high-severity Node.js flaws that can expose sensitive data, bypass file restrictions, or redirect file access.

These include a vm memory leak issue that could expose in-process secrets such as ANTHROPIC_API_KEY, an fs permission bypass that can escape workspace boundaries through crafted symlinks, and Windows path handling weaknesses that can redirect I/O unexpectedly. A separate HashDoS issue in Node 24 can also let attacker-controlled input cause severe performance degradation.

The Undici HTTP layer, which powers fetch(), introduces additional exposure through resource exhaustion and request integrity weaknesses. Recent issues showed that malicious servers could trigger excessive decompression workloads or exploit predictable multipart boundaries to interfere with request handling.

The most structurally important risk is MCP itself. Current research shows that MCP tools can hide malicious instructions from users while exposing them to the model, silently redefine tools after approval, and even let one server influence another server’s trusted tools. These are not isolated implementation bugs but protocol-level trust problems, and they remain largely unpatched.

NODE.JS RUNTIME (v18+ required, v22 LTS recommended)

CVE-2025-55131 — Buffer Non-Zeroing Race in vm Module (High, Fixed Jan 2026)
Buffer.alloc() and Uint8Array instances may contain leftover in-process data
when allocations are interrupted under the vm timeout option.

Claude Code holds ANTHROPIC_API_KEY in the Node.js process environment for the session lifetime; this creates a realistic in-process token leak path. Affects 20.x–25.x. Fixed in 20.20.0, 22.22.0, 24.13.0, 25.3.0. CWE-200.

CVE-2025-55130 — FS Permission Bypass via Crafted Symlinks (High, Fixed Jan 2026)
–allow-fs-read / –allow-fs-write enforcement bypassed using relative symlink
chains.

A script restricted to the project workspace can escape to read
~/.claude/settings.json, stored MCP credentials, or ~/.ssh/id_rsa. Affects
20.x–25.x. Fixed in the January 2026 security releases. CWE-61.

CVE-2025-27210 — Windows path.join() Bypass via Reserved Device Names (High, Fixed Jul 2025) Incomplete fix for CVE-2025-23084.

Windows device names (CON, PRN, AUX) bypass path normalization in path.join() on Windows hosts. Claude Code resolves all workspace and config paths through path.join; a crafted project path can route I/O to unexpected handles. Affects 20.x, 22.x, 24.x. Fixed in 20.19.4,
22.17.1, 24.4.1. CWE-22.

CVE-2025-27209 — HashDoS via V8 rapidhash (High, Fixed Jul 2025)
Node.js v24 adopted rapidhash for string hashing, reintroducing HashDoS.

An attacker who controls input strings can generate hash collisions without the
seed, causing catastrophic slowdown in JS object operations. Claude Code parses LLM tool outputs and file-system content as JS objects; attacker-controlled file content or tool responses can trigger this path. Specific to 24.x. Fixed in 24.4.1. CWE-400.

UNDICI (Node.js built-in fetch / Claude Code HTTP transport)

CVE-2026-22036 — Unbounded Decompression Chain via Content-Encoding (Moderate, Fixed Jan 2026)fetch() applies RFC 9110 chained Content-Encoding with no step limit. A rogue server can send thousands of nested compression layers, exhausting CPU and memory before any content is processed. Claude Code uses fetch() for all
Anthropic API calls and HTTP-based MCP transports. Affects undici < 6.23.0
and 7.0.0–7.18.1. Fixed in 6.23.0, 7.18.2. CWE-770.

CVE-2025-22150 — Predictable multipart/form-data Boundary via Math.random() (Moderate, Fixed Jan 2025) undici uses Math.random() for multipart boundary generation. Once several values are observed, the output is predictable. An attacker-controlled endpoint (e.g., a compromised MCP server) can recover the seed and tamper with subsequent request bodies, injecting tool parameters or session tokens. Affects undici < 5.28.5, < 6.21.1, < 7.2.3. Fixed in respective patch versions. CWE-330.

MODEL CONTEXT PROTOCOL (MCP — Claude Code’s extension layer)

Tool Poisoning Attacks — No CVE; Protocol Design Issue (Disclosed Apr 2025)
Invariant Labs demonstrated that hidden instructions inside MCP tool docstrings
which is visible to the LLM but not rendered in client UIs will instruct Claude to read sensitive files (e.g., ~/.ssh/id_rsa, ~/.claude/settings.json) and exfiltrate
contents via a hidden tool parameter. The user sees only a simplified confirm
dialog. Confirmed against Claude Code, Claude Desktop, and Cursor. Anthropic
and OpenAI notified prior to disclosure. No patch at protocol level. CWE-693.

MCP Rug Pull — Silent Post-Approval Tool Redefinition (Disclosed Apr 2025)
MCP servers can mutate tool definitions after installation without client
notification. A server that appears benign at approval time can silently swap
its description to include credential-harvesting payloads at any point in an
agentic session. No CVE; no patch; requires client-side hash pinning of tool
descriptions to detect post-approval changes. CWE-494.

Cross-Server Tool Shadowing (Disclosed Apr 2025)
A malicious server loaded alongside a trusted server can inject instructions
into the trusted server’s tool behavior. Invariant’s experiment showed a rogue
add() tool description redirecting all send_email() calls to an attacker address

Claude complied even when the email tool came from a separate trusted server.
Claude Code runs multi-server MCP sessions natively and inherits this exposure.
No CVE; no patch at protocol level. Restrict loaded servers to a verified
minimal set. CWE-284.

NPM SUPPLY CHAIN (Claude Code’s distribution layer)

September 2025 — 18 widely-used npm packages compromised via maintainer phishing.
Credential-harvesting payloads captured npm tokens, AWS keys, and environment
variables from CI/CD systems. Combined weekly downloads exceeded 2 billion.
@anthropic-ai/claude-code and its 205+ transitive dependents (as of March 2026)
operate in this same ecosystem. A compromised dependency at install time runs
with full Node.js process privileges and can exfiltrate ANTHROPIC_API_KEY if
set in the environment.

November 2025 — Shai-Hulud 2.0 worm propagated through npm’s pre-install phase,
before dependency resolution completes, achieving near-100% execution on build
systems with no user interaction. The worm used the legitimate bun.sh domain as
a callback to evade network allow-listing. The University of Toronto security
advisory flagged bun.sh as actively abused for malware initialization and
recommended monitoring outbound connections to that domain. Claude Code does not
use Bun, but this worm class targets the npm pre-install hook mechanism, which
applies equally to the Node.js ecosystem.

PackageGate Research — npm’s –ignore-scripts flag, lockfile integrity checks,
and trusted-package mechanisms all have documented gaps allowing malicious
lifecycle scripts to execute at install time. Claude Code ships its binary via
a postinstall hook; any supply-chain compromise that intercepts that hook stage
runs with full user privileges before the binary is functional. npm (under
Microsoft) closed the PackageGate report as “works as expected.” CWE-494.

CLAUDE.md (Trusted context file / persistent instruction surface)

CLAUDE.md Prompt Injection — No CVE; Agentic Design Risk
Claude Code reads CLAUDE.md from the project root and any subdirectory entered
during agentic operations, treating its contents as high-trust operator
instructions. An attacker who can write to CLAUDE.md — via a compromised git
dependency, malicious submodule, or social-engineering PR — can inject
persistent instructions into every session that opens the project. Demonstrated
patterns include: exfiltrating environment variables via tool calls, disabling
safety behaviors, overriding .gitignore to expose secrets, and pre-authorizing
destructive shell commands. Anthropic’s model constitution does not block this
when instructions appear in the trusted context window. CWE-693.

OPENSSL (TLS stack underlying all API communication)

OpenSSL January 2026 Assessment — Low/Moderate issues bundled into Node.js
Node.js bundles OpenSSL; the January 2026 OpenSSL security advisory fixes were
incorporated into the January 2026 Node.js security releases. Exploitation
requires network adjacency or a MITM position on the TLS channel between
Claude Code and api.anthropic.com. Risk elevated in corporate environments
using TLS inspection proxies.

NODE.JS v18 END-OF-LIFE

Node.js v18 (Hydrogen) reached end-of-life on March 27, 2025. Claude Code’s
npm package historically advertised Node.js 18+ compatibility. All CVEs in
this document remain permanently unpatched on the v18 line. Installations on
Node.js 18 should be treated as unpatched for security purposes; upgrade to
22.x (Maintenance LTS) or 24.x (Active LTS) immediately.

KEY TAKEAWAYS

Keep Node.js updated to 22.x Maintenance LTS or 24.x Active LTS; avoid 18.x.
Pin npm installs to a known-good lockfile; treat the postinstall hook as a
privileged boundary and verify checksums before CI/CD execution.
Treat CLAUDE.md in third-party repositories as untrusted input; code-review
CLAUDE.md changes as rigorously as shell scripts or CI configuration.
Audit all loaded MCP servers before connecting. Each server can shadow and
override the behavior of every other loaded server’s tools.
Do not load untrusted MCP servers alongside servers handling sensitive data
(email, calendar, cloud credentials, source control tokens).
Monitor outbound network connections during sessions; MCP tool poisoning
exfiltrates via legitimate-looking API calls.
The intersection of developer credential access (API keys, SSH keys, git
tokens) + persistent LLM context + npm supply chain exposure makes Claude
Code an extremely high-value profile target for credential harvesting.
Claude Code Security (Anthropic, Feb 2026) scans codebases for vulnerabilities
but does not protect Claude Code itself from the runtime and MCP-layer risks
described in this document.

REFERENCES

Node.js Project. (2026, January 13). Tuesday, January 13, 2026 Security Releases.
nodejs.org/en/blog/vulnerability/december-2025-security-releases
Node.js Project. (2025, July 15). Tuesday, July 15, 2025 Security Releases.
nodejs.org/en/blog/vulnerability/july-2025-security-releases
GitHub Security Advisory. (2026, January 14). GHSA-g9mf-h72j-4rw9:
Unbounded decompression chain in undici. CVE-2026-22036.
GitHub Security Advisory. (2025, January 21). GHSA-c76h-2ccp-4975:
Use of insufficiently random values in undici fetch(). CVE-2025-22150.
Invariant Labs. (2025, April 1). MCP Security Notification: Tool Poisoning
Attacks. invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
Invariant Labs. (2025, April 7). WhatsApp MCP Exploited: Exfiltrating your
message history via MCP. invariantlabs.ai/blog/whatsapp-mcp-exploited
Willison, S. (2025, April 9). Model Context Protocol has prompt injection
security problems. simonwillison.net/2025/Apr/9/mcp-prompt-injection/
Cross, E. (2025, April 5). The “S” in MCP Stands for Security.
elenacross7.medium.com/the-s-in-mcp-stands-for-security-91407b33ed6b
Palo Alto Networks Unit 42. (2025, September 10). Widespread npm supply
chain attack via maintainer phishing.
University of Toronto Information Security. (2025, December 5). NPM supply
chain poisoning; bun.sh domain flagged for malware initialization.
Koi.ai Security Research. (2025). PackageGate: 6 zero-days in JS package managers.
Node.js Project. (2025). Node.js release schedule and end-of-life dates.
nodejs.org/en/about/previous-releases
Model Context Protocol. (2025). Security Policy and Trust Model.
github.com/modelcontextprotocol/modelcontextprotocol/security
Anthropic. (2026, February 20). Making frontier cybersecurity capabilities
available to defenders (Claude Code Security launch).
anthropic.com/research/claude-code-security
Node.js Project. (2026, January). OpenSSL Security Advisory Assessment,
January 2026. nodejs.org/en/blog/vulnerability/openssl-fixes-in-regular-releases-jan2026

OPENCODE — RUNTIME & STACK VULNERABILITIES

As of March 2026, OpenCode’s primary security exposure is not a single flaw but the combined risk of its Bun runtime, Go TUI layer, npm supply chain, and AI-driven shell execution model.

The most urgent issue is CVE-2026-24910 in Bun, fixed in v1.3.5, which allowed malicious packages using trusted package names to run lifecycle scripts without validating source origin. For OpenCode, this creates meaningful supply-chain risk because install-time execution is part of the platform’s operating environment.

A second Bun issue, CVE-2025-8022, was withdrawn, but it still matters operationally because OpenCode executes shell commands. Even if Bun is not formally at fault, unsafe argument handling remains a direct application-level risk.

The Go runtime adds both denial-of-service and build integrity concerns, especially recent vulnerabilities involving CPU exhaustion, memory exhaustion, HTML parser complexity, and cgo code smuggling, which raises concern over whether shipped TUI binaries were built with a trustworthy toolchain.

The npm ecosystem remains the broadest systemic threat. Recent real-world compromises, including maintainer-phishing campaigns and pre-install malware propagation, show that OpenCode’s dependency and distribution model fits a high-value supply-chain target profile.

BUN RUNTIME

March 2026

CVE-2026-24910 — Trust Validation Bypass (Fixed in Bun v1.3.5). Bun ships with a default allow-list of 366 trusted package names permitted to run
lifecycle scripts during installation.

The trust check validates package name only, not source origin. An attacker can craft a malicious package under a trusted name(e.g., esbuild, sharp, playwright) and distribute it via a file path, symlink, git repo, or GitHub URL. Bun grants it full execution privileges. CWE-348.

CVE-2025-8022 — Shell API Argument Injection (Withdrawn but relevant)
Initially filed as OS command injection in Bun’s $ shell API. Withdrawn because
responsibility was assigned to the calling application, not Bun. Remains a direct
concern for OpenCode since its core function is executing AI-directed shell commands.

GO RUNTIME (TUI layer)

CVE-2025-61728 — Excessive CPU consumption in archive/zip package (Important)
CVE-2025-61726 — Memory exhaustion via net/url query parameter parsing (Important)
CVE-2025-61732 — Code smuggling via cgo doc comment parsing (Important)

Note: The cgo issue allowed code to be smuggled into compiled Go binaries without
detection. Supply chain integrity concern for any Go binary shipped to end users,
including OpenCode’s TUI component.

CVE-2025-58190 — Infinite parsing loop in golang.org/x/net/html (Feb 2026, DoS)
CVE-2025-47911 — Quadratic parsing complexity in golang.org/x/net/html (Feb 2026, DoS)

NPM SUPPLY CHAIN (OpenCode’s distribution layer)

September 2025 — 18 widely used npm packages compromised via maintainer phishing.
Malicious versions injected with credential-harvesting payloads. Packages had over
2 billion combined weekly downloads. OpenCode installs through this same ecosystem.

November 2025 — Shai-Hulud 2.0 worm propagated through npm’s pre-install phase,
before dependency resolution completes, achieving near-100% execution on build
systems with no user interaction. The worm disguised itself using the legitimate
bun.sh domain as a callback. The University of Toronto security advisory explicitly flagged bun.sh as actively abused for malware initialization and recommended network monitoring alerts for outbound traffic to that domain.

PackageGate Research — Bun’s trustedDependencies mechanism, npm’s –ignore-scripts
flag, and lockfile integrity checks all have documented gaps allowing malicious
lifecycle scripts to execute at install time. npm (under Microsoft) closed the
report as “works as expected.” Bun patched in v1.3.5.

SQLITE (Local session storage)

CVE-2025-6965 — Buffer overflow in SQLite before v3.50.2. Integer overflow when
aggregate query terms exceed available columns, leading to potential memory
corruption. Practical risk for OpenCode’s local usage is low — exploitation
generally requires the attacker to already control SQL input or supply a malicious
database file.

OPENCODE KEY TAKEAWAYS

Keep Bun updated to v1.3.5 or later.
Monitor outbound network traffic for unexpected connections to bun.sh.
The intersection of Bun runtime + npm distribution + developer credential access makes OpenCode an ideal profile target for supply chain attacks.
The Go TUI binary should be verified against a known-good build toolchain version.
SQLite risk is low in isolation but elevated if any other local compromise exists.

REFERENCES

SentinelOne, Inc. (2026). CVE-2026-24910: Bun auth bypass vulnerability.
GitHub Advisory Database. (2025). Withdrawn: CVE-2025-8022, Bun shell injection.
Red Hat Product Security. (2026). RHSA-2026:2708, Go Toolset security update.
Google, Inc. (2026). Go vulnerability database: CVE-2025-58190, CVE-2025-47911.
Palo Alto Networks Unit 42. (2025, September 10). Widespread npm supply chain attack.
FireCompass Intelligence. (2025, December 19). Weekly report: CVEs Dec 10-17, 2025.
University of Toronto Information Security. (2025, December 5). NPM supply chain poisoning.
Koi.ai Security Research. (2025). PackageGate: 6 zero-days in JS package managers.
SentinelOne, Inc. (2025). CVE-2025-6965: SQLite buffer overflow vulnerability.
SQLite Development Team. (n.d.). CVEs about SQLite. SQLite.org.

Your AI Agent Is The Attacker – Claude, OpenCode – Threats and Security Designs

AI THREATS IN THE WILD