Security Experts Warn All Major LLMs Can Be Deceived To Produce Malicious Content Using A Simple Universal Prompt

A new threat to large language models (LLMs) is on the rise after a security researcher at HiddenLayer was able to highlight a concerning feature.

They claim that a single universal prompt on the LLM can give rise to malicious content without users even realizing it. All the top models in the industry, including ChatGPT, Llama, Deepseek, Qwen, Copilot, Gemini, and Mistral, were said to be vulnerable to the tactic that is novel. Therefore, researchers are raising the alarm by calling it Policy Puppetry Prompt Injection.

The single universal prompt makes chatbots give instructions on how to enrich uranium, produce bombs, or even give rise to methamphetamine at home. This exploits the systemic weakness, which has to do with the figure of LLMs trained using instructions or policy data. So this is very hard to fix.

The malicious prompt features several things as a whole. This includes getting formatted similar to policy files like XML, JSON, and INI. It would end up tricking the chatbot into subverting the commands.

Attackers get the chance to simply bypass the system prompts and any kind of safety measures in place that are trained into these models. Instructions don’t need to be in a certain policy language. However, it was noted that these prompts are produced in a manner that the highlighted LLM could interpret any policy.

Secondly, some very dangerous requests can be rewritten using leetspeak. This gets rid of letters with similarly appearing figures or digits. As per researchers, reasoning models that were more modern than their counterparts needed more difficult prompts to give rise to consistent answers. Amongst those included are Gemini 2.5 and ChatGPT o1.

The last prompt entails well-known roleplaying methods that feature directing the model to take on certain roles, jobs, and features within fictional settings. Despite specific training to let go of all user requests and instructing them to produce dangerous content, all the major models fell victim to this attack. More importantly, the system was designed to extract complete system prompts.

The paper shared how the chatbots can monitor dangerous material with ease. External monitoring is needed to highlight and respond to dangerous injection attacks taking place in real time.

The visibility of several repetitive universal bypasses gives rise to attackers no longer requiring complex knowledge for the attacks or adjusting attacks for every specific model. Anyone having a keyboard could ask the dangerous prompt, produce anthrax, and take complete control over the model, the researchers shared.

The study also warned that there was a clear need for security tools and detection techniques to ensure these chatbots remain safe and guarded at all times.

Read next: Microsoft’s AI Assistant Copilot Struggles To Mark Its Territory As Competition Heats Up

Source link

What's Hot

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters – Takara TLDR

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment – Takara TLDR

Judge lifts order requiring OpenAI to preserve ChatGPT logs

Security Experts Warn All Major LLMs Can Be Deceived to Produce Malicious Content Using a Simple Universal Prompt

AI Systems Can Be Fooled by Fake Dates, Giving Newer Content Unfair Visibility

NBA China and Alibaba Cloud announce multiyear collaboration to reimagine fan engagement

it takes more than chips to win the AI race

Smithsonian Closes Museums Amid Government Shutdown

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters – Takara TLDR

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment – Takara TLDR

Judge lifts order requiring OpenAI to preserve ChatGPT logs

What's Hot

Security Experts Warn All Major LLMs Can Be Deceived to Produce Malicious Content Using a Simple Universal Prompt

Related Posts

Subscribe to Updates