Claude’s New AI File-creation Feature Ships With Security Risks Built In

Anthropic’s recommended mitigation for users is to “monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly,” although this places the burden of security on the user. Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.”

Anthropic’s mitigations

Anthropic is not completely ignoring the problem, however, and it has implemented several security measures for the file-creation feature. The company has implemented a classifier that attempts to detect prompt injections and stop execution if they are detected. In addition, for Pro and Max users, Anthropic disabled public sharing of conversations that use the file-creation feature. For Enterprise users, the company implemented sandbox isolation so that environments are never shared between users. The company also limited task duration and container runtime “to avoid loops of malicious activity.”

Anthropic provides an allowlist of domains Claude can access for all users, including api.anthropic.com, github.com, registry.npmjs.org, and pypi.org. Team and Enterprise administrators have control over whether to enable the feature for their organizations

Anthropic’s documentation states the company has “a continuous process for ongoing security testing and red-teaming of this feature.” The company encourages organizations to “evaluate these protections against their specific security requirements when deciding whether to enable this feature.”

Prompt injections galore

Even with Anthropic’s security measures, Willison says he’ll be cautious. “I plan to be cautious using this feature with any data that I very much don’t want to be leaked to a third party, if there’s even the slightest chance that a malicious instruction might sneak its way in,” he wrote on his blog.

We covered a similar potential prompt-injection vulnerability with Anthropic’s Claude for Chrome, which launched as a research preview last month. For enterprise customers considering Claude for sensitive business documents, Anthropic’s decision to ship with documented vulnerabilities suggests competitive pressure may be overriding security considerations in the AI arms race.

That kind of “ship first, secure it later” philosophy has caused frustrations among some AI experts like Willison, who has extensively documented prompt-injection vulnerabilities (and coined the term). He recently described the current state of AI security as “horrifying” on his blog, noting that these prompt-injection vulnerabilities remain widespread “almost three years after we first started talking about them.”

In a prescient warning from September 2022, Willison wrote that “there may be systems that should not be built at all until we have a robust solution.” His recent assessment in the present? “It looks like we built them anyway!”

This story was updated on September 10, 2025 at 9:50 AM to correct information about Anthropic’s red-teaming efforts and to add detail to Anthropic’s mitigation measures.

Source link

What's Hot

Perplexity reportedly raised $200M at $20B valuation

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

OpenAI and Oracle strike $300B cloud computing deal to power AI

Claude’s new AI file-creation feature ships with security risks built in

AI Made Her a Better Mom. so She Vibe-Coded a Web App for Others.

Anthropic To Train Claude On Chats As They Leak Into Google Search 09/08/2025

36 Claude Code Tips for Smarter, Faster AI Coding Workflows

Christie’s Will Auction The First Calculating Machine In History

Ralph Rugoff to Leave London’s Hayward Gallery After 20 Years

New York Foundation for the Arts Workers Move to Unionize

Patrizia Sandretto Re Rebaudengo Teams Up with New Museum