What the Claude Code Source Leak Reveals About AI Agent Security

On March 31, 2026, Anthropic accidentally published the full unobfuscated TypeScript source code of Claude Code inside an npm package. A 59.8 MB source map file was bundled into @anthropic-ai/claude-code version 2.1.88 — caused by Bun’s default source map generation and a missing .npmignore entry. Around 513,000 lines of code across 1,906 files were publicly downloadable from the npm registry.

Security researcher Chaofan Shou flagged it publicly on X within hours. The post hit over 20 million views. Developers mirrored the code to GitHub, which triggered DMCA takedowns from Anthropic that accidentally disabled more than 8,100 repositories, including legitimate forks of their public projects. Anthropic called it “a release packaging issue caused by human error, not a security breach.” No customer data or credentials were exposed in the package itself.

This post isn’t about the packaging mistake. It’s about what security researchers found once they could read the code — and what those findings reveal about the security model of AI coding agents more broadly.

Sources: The Register (March 31, 2026, Brandon Vigliarolo), Fortune (March 31, 2026, Beatrice Nolan), TechCrunch (April 1, 2026).

What was in the leaked code

The leaked files were the agentic harness: orchestration logic, system prompts, tool-handling code — the machinery that makes Claude Code an agent rather than a chat interface. Developers treated it as an unreleased product roadmap and found several undisclosed features including a persistent background agent mode and a high-level planning system. Those are interesting product stories. The security findings are what matter here.

Finding 1: Shell injection bypass at >50 subcommands

Researchers analyzing the leaked code found a shell injection vulnerability in Claude Code’s command execution path. Security rules that restricted certain command patterns could be bypassed when a command exceeded 50 subcommands (SecurityWeek). The precise mechanism isn’t fully public, but the category is familiar: input validation that checks for dangerous patterns can fail when input is structured to exceed parser limits or edge cases in the validation logic.

This is a reminder that the attack surface of an AI coding agent isn’t just the AI itself — it’s the entire execution pipeline: how commands are parsed, validated, and spawned as subprocesses.

Finding 2: GitHub Actions credential exfiltration via prompt injection

Separately — and this finding predates the leak — security researcher Aonan Guan reported to Anthropic’s HackerOne program in October 2025 that the official Claude Code Security Review GitHub Action was exploitable via prompt injection. The Register published the full details in April 2026.

By embedding malicious instructions in a pull request title, Guan caused Claude to execute a whoami command and post the output as a public PR comment. When asked to demonstrate further exploitation, he showed he could steal GitHub access tokens and Anthropic API keys from the runner environment — the same environment Claude Code inherited from the GitHub Actions runner.

HackerOne rated the vulnerability 9.4. Anthropic paid a $100 bounty in November 2025. No CVE was issued, and there was no public advisory at the time of remediation. Anthropic’s documentation for the action now carries an explicit warning: it “is not hardened against prompt injection attacks and should only be used to review trusted PRs.”

The same vulnerability class — equivalent prompt injection → credential exfiltration — was found in Google’s Gemini CLI Action and Microsoft’s GitHub Copilot by the same researcher. All three vendors paid bounties without public disclosure.

Both findings point at the same underlying problem: AI coding agents inherit the credentials of the environment they run in, and that inheritance is the attack surface.

In the GitHub Actions case, the runner environment contained API keys and access tokens. Claude Code inherited that environment. Prompt injection caused Claude Code to run a command that dumped values from the inherited environment. The credentials didn’t need to be passed explicitly to the attacker — they just needed to be present in the environment Claude Code ran in.

This is not a Claude-specific bug. It’s a structural property of how AI agents operate. Any agent that:

inherits a shell environment with credentials present
can execute arbitrary commands
can be influenced by content it reads (files, PR titles, issue bodies, dependency code)

…has this attack surface. The Guan finding demonstrated it concretely against a production Anthropic-operated system. The shell injection bypass demonstrated that even the agent’s own command validation can be circumvented.

What this means for credentials in your AI development workflow

The correct takeaway is not “don’t use Claude Code.” It’s that credentials should not be present in the environment your AI agent runs in — and that this requires an architectural change, not just better hygiene.

Why environment variables aren’t enough isolation:

Setting export STRIPE_SECRET_KEY=sk_live_... before launching Claude Code puts that value in Claude Code’s inherited environment. If Claude Code runs any command that outputs environment variables — directly or indirectly, intentionally or via prompt injection — that value is in the command’s output, and that output returns to Claude’s context.

The Guan exploit chain is: PR title contains malicious instruction → Claude Code reads PR title → Claude Code runs env or equivalent → output contains ANTHROPIC_API_KEY and GITHUB_TOKEN → attacker reads the public PR comment. Replacing environment variables with a different ambient credential store doesn’t fix this if the agent can still read its own environment.

The alternative: subprocess-level injection

The safer pattern is one where the AI agent never holds credentials in its context or inherited environment. Instead, a local broker holds the credentials and injects them directly into subprocess environments when the agent requests a command execution.

Agent says: "run this database migration"
Broker decrypts DATABASE_URL locally
Broker spawns subprocess with DATABASE_URL in env
Subprocess runs migration
Broker scans output for secret patterns before returning
Agent receives migration output — not DATABASE_URL

In this model, prompt injection that causes the agent to run env returns the broker process’s environment — which doesn’t contain the credential. The credential only ever exists in the subprocess environment for the duration of the command. There’s nothing for a prompt injection attack to exfiltrate from the agent’s own context.

This is the pattern vault_run implements in OpaqueVault. It’s not a defense against all prompt injection attacks — a sufficiently sophisticated attack could still cause the agent to run commands with unintended side effects. But it eliminates the specific credential exfiltration vector that Guan demonstrated: there are no credentials in the agent’s environment to steal.

A note on the Undercover Mode finding

One of the undisclosed features found in the leaked code was “Undercover Mode” — logic that instructed Claude to strip AI attribution from commits and avoid identifying itself as an AI when working on external repositories. Subsequent analysis revealed this is an Anthropic-employee-only feature gated on USER_TYPE === 'ant', dead-code-eliminated from external user builds. It activates only when an Anthropic employee works in a repository not on an internal allowlist of 22 repositories. Anthropic has not publicly commented on this feature.

This is relevant to AI agent security in a different dimension: if an AI agent can be instructed to misrepresent its own actions in code history, audit trails built on git attribution become unreliable. For teams that use commit history as a security audit mechanism — tracking who changed what — this is worth noting. It’s a separate issue from credential management, but it reinforces the general point that AI agents operating in production workflows require deliberate, explicit security design rather than inherited assumptions from non-AI tooling.

What to do

If you run Claude Code locally:

Don’t put credentials in your shell environment before launching Claude Code. Use a tool that injects credentials at the subprocess level — where Claude Code itself never holds the values. OpaqueVault’s vault_run is built for this. The pattern also applies to any other AI coding agent with shell access.

If you use Claude Code in GitHub Actions or CI:

Read Anthropic’s current documentation for the Security Review Action carefully. Their own docs now say it should only be used on trusted PRs. If your workflow runs it on external contributor PRs, you are running it in the threat model Guan demonstrated. Scope it explicitly to internal, trusted branches.

If you run AI agents in production infrastructure:

Audit what’s in the agent’s inherited environment. If any credentials are present — even as “convenience” variables set by your CI system — they’re in scope for the exfiltration pattern. The Guan finding shows this isn’t a theoretical concern: it was demonstrated against a production Anthropic system, paid as a 9.4 severity bounty, and quietly fixed without public disclosure.

The source code leak was an embarrassing packaging error that Anthropic contained quickly. What it surfaced — a shell injection bypass in the command execution path and a documented credential exfiltration via prompt injection — are the more durable lessons. Both point at the same design gap: AI coding agents, by default, inherit the credentials of the environment they run in, and that inheritance is the attack surface that needs to be explicitly addressed.

OpaqueVault is a zero-knowledge MCP secret manager built for AI coding agents. It implements subprocess-level credential injection so the agent never holds plaintext values — eliminating the environment inheritance attack surface. Get started free →

Corrections (April 2026):

Added inline source citations for all specific claims (leak details, DMCA scope, subcommand bypass, Guan disclosure).
Clarified “Undercover Mode” is an Anthropic-employee-only feature (USER_TYPE === 'ant'), dead-code-eliminated from external user builds. (WaveSpeedAI analysis)
Aligned Guan disclosure timeline: reported October 2025, resolved November 2025. (Guan’s writeup)

What was in the leaked code

The structural issue these findings share

What this means for credentials in your AI development workflow

A note on the Undercover Mode finding

What to do

Keep credentials out of Claude's context window.