They claim that a single universal prompt on the LLM can give rise to malicious content without users even realizing it. All the top models in the industry, including ChatGPT, Llama, Deepseek, Qwen, Copilot, Gemini, and Mistral, were said to be vulnerable to the tactic that is novel. Therefore, researchers are raising the alarm by calling it Policy Puppetry Prompt Injection.

The single universal prompt makes chatbots give instructions on how to enrich uranium, produce bombs, or even give rise to methamphetamine at home. This exploits the systemic weakness, which has to do with the figure of LLMs trained using instructions or policy data. So this is very hard to fix.
The malicious prompt features several things as a whole. This includes getting formatted similar to policy files like XML, JSON, and INI. It would end up tricking the chatbot into subverting the commands.
Attackers get the chance to simply bypass the system prompts and any kind of safety measures in place that are trained into these models. Instructions don’t need to be in a certain policy language. However, it was noted that these prompts are produced in a manner that the highlighted LLM could interpret any policy.
Secondly, some very dangerous requests can be rewritten using leetspeak. This gets rid of letters with similarly appearing figures or digits. As per researchers, reasoning models that were more modern than their counterparts needed more difficult prompts to give rise to consistent answers. Amongst those included are Gemini 2.5 and ChatGPT o1.
The last prompt entails well-known roleplaying methods that feature directing the model to take on certain roles, jobs, and features within fictional settings. Despite specific training to let go of all user requests and instructing them to produce dangerous content, all the major models fell victim to this attack. More importantly, the system was designed to extract complete system prompts.
The paper shared how the chatbots can monitor dangerous material with ease. External monitoring is needed to highlight and respond to dangerous injection attacks taking place in real time.
The visibility of several repetitive universal bypasses gives rise to attackers no longer requiring complex knowledge for the attacks or adjusting attacks for every specific model. Anyone having a keyboard could ask the dangerous prompt, produce anthrax, and take complete control over the model, the researchers shared.
The study also warned that there was a clear need for security tools and detection techniques to ensure these chatbots remain safe and guarded at all times.
Read next: Microsoft’s AI Assistant Copilot Struggles To Mark Its Territory As Competition Heats Up