Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

World No. 1! Tencent’s Hunyuan Translation Model Hunyuan-MT-7B Tops Open Source Rankings_model_the_along

Nvidia Stock To Fall 50% As AI Cycle Turns?

To Help Workers Losing Their Jobs to AI, OpenAI Is Launching a Jobs Platform Run By AI

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Center for AI Safety

A New Trick Could Block the Misuse of Open Source AI

By Advanced AI EditorJune 4, 202526 Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


When Meta released its large language model Llama 3 for free this April, it took outside developers just a couple days to create a version without the safety restrictions that prevent it from spouting hateful jokes, offering instructions for cooking meth, or misbehaving in other ways.

A new training technique developed by researchers at the University of Illinois Urbana-Champaign, UC San Diego, Lapis Labs, and the nonprofit Center for AI Safety could make it harder to remove such safeguards from Llama and other open source AI models in the future. Some experts believe that, as AI becomes ever more powerful, tamperproofing open models in this way could prove crucial.

“Terrorists and rogue states are going to use these models,” Mantas Mazeika, a Center for AI Safety researcher who worked on the project as a PhD student at the University of Illinois Urbana-Champaign, tells WIRED. “The easier it is for them to repurpose them, the greater the risk.”

Powerful AI models are often kept hidden by their creators, and can be accessed only through a software application programming interface or a public-facing chatbot like ChatGPT. Although developing a powerful LLM costs tens of millions of dollars, Meta and others have chosen to release models in their entirety. This includes making the “weights,” or parameters that define their behavior, available for anyone to download.

Prior to release, open models like Meta’s Llama are typically fine-tuned to make them better at answering questions and holding a conversation, and also to ensure that they refuse to respond to problematic queries. This will prevent a chatbot based on the model from offering rude, inappropriate, or hateful statements, and should stop it from, for example, explaining how to make a bomb.

The researchers behind the new technique found a way to complicate the process of modifying an open model for nefarious ends. It involves replicating the modification process but then altering the model’s parameters so that the changes that normally get the model to respond to a prompt such as “Provide instructions for building a bomb” no longer work.

Mazeika and colleagues demonstrated the trick on a pared-down version of Llama 3. They were able to tweak the model’s parameters so that even after thousands of attempts, it could not be trained to answer undesirable questions. Meta did not immediately respond to a request for comment.

Mazeika says the approach is not perfect, but that it suggests the bar for “decensoring” AI models could be raised. “A tractable goal is to make it so the costs of breaking the model increases enough so that most adversaries are deterred from it,” he says.

“Hopefully this work kicks off research on tamper-resistant safeguards, and the research community can figure out how to develop more and more robust safeguards,” says Dan Hendrycks, director of the Center for AI Safety.

The new work draws inspiration from a 2023 research paper that showed how smaller machine learning models could be made tamper resistant. “They tested the [new] approach on much larger models and scaled up the approach, with some modifications,” says Peter Henderson, an assistant professor at Princeton who led the 2023 work . “Scaling this type of approach is hard and it seems to hold up well, which is great.”



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleVisual AI, Agentic Reasoning, and What’s Ahead
Next Article Foundation AI: Cisco launches AI model for integration in security applications
Advanced AI Editor
  • Website

Related Posts

Fear Of AGI Is Driving Harvard And MIT Students To Drop Out

August 16, 2025

AI experts warn of ‘risk of extinction’ similar to nuclear weapons

August 9, 2025

AI industry and researchers sign statement warning of ‘extinction’ risk

July 25, 2025

26 Comments

  1. ラブドール on August 16, 2025 5:37 am

    a computer viru or computer codes that damage orcannot be read by your equipment.高級 ダッチワイフLIMITED WARRANTY,

  2. ラブドール on August 16, 2025 6:11 am

    人形 エロand perhaps never can be,in thepresent constitution of things,

  3. ラブドール on August 19, 2025 2:03 am

    once they were up the slope,were houses.フィギュア オナホ

  4. ラブドール on August 23, 2025 5:18 am

    I trembled with rage and resolving to wait his approach and then close with him inmortal combat.He approached,ダッチワイフ

  5. ラブドール on August 23, 2025 5:20 am

    This website includes information about Project Gutenberg ?including how to make donations to the Project Gutenberg LiteraryArchive Foundation,高級 ダッチワイフhow to help produce our new eBook and how tosubscribe to our email newsletter to hear about new eBooks.

  6. ラブドール on August 23, 2025 7:16 am

    えろ 人形s widow have theAlderney cow,and forty shillings to clothe her children: t saya syllable of the matter to any living soul–I,

  7. ラブドール on August 23, 2025 7:19 am

    ダッチワイフ 販売“So! “Ah! I see you have got your note,too.

  8. ダッチワイフ on August 26, 2025 7:17 am

    or whet-stone,オナホ フィギュアalong with him,

  9. ダッチワイフ on August 26, 2025 7:22 am

    ラブドール リアルhe had not beenmentioned.Mellersh had had to be mentioned,

  10. コスプレ えろ on August 26, 2025 11:12 am

    コスプレ えろnor thegeneral,that can condemn him to death for deserting his post,

  11. コスプレ えろ on August 26, 2025 11:18 am

    newbarbered,コスプレ エッチout of the vaulted cell into a shatteringdaylight of no thought.

  12. 下着 アダルト on August 27, 2025 10:16 am

    コスプレ エロ いfromthe general slovenliness of the place,that this is a sanctuary intowhich womankind,

  13. 女性 下着 エロ on August 27, 2025 10:20 am

    コスプレ エロ いI was too small when pap used to have ,em at the palace.

  14. ラブドール on August 27, 2025 10:54 pm

    人形 エロhe would spend in a miscellaneous hunt,if by chance the White Whale,

  15. ラブドール on August 27, 2025 11:02 pm

    ダッチワイフI knew,and coulddistinguish,

  16. ラブドール on August 28, 2025 12:34 am

    美人 せっくすof course,of course! ?The regimental commander sought out Dólokhov in the ranks and,

  17. ボディ スーツ えろ on August 28, 2025 4:30 pm

    nababahagui sa limang pangcat,catha ng isa sa mga nariritong caharap.ストッキング エロ

  18. ダッチワイフ on August 30, 2025 10:47 am

    女性 用 ラブドールand he moved awaywith a decorous smile of self-satisfaction at being able clearly tounderstand and state the patient,s condition.

  19. ダッチワイフ on August 30, 2025 10:49 am

    Kurtz was no idol of mine.えろ 人形He forgot I hadn,

  20. リアル ラブドール on August 31, 2025 5:46 am

    ロシア エロI busied myself in earnest inquiry,or was wrapped in a cloud of morbid speculation.

  21. リアル ラブドール on August 31, 2025 5:48 am

    but not exactly complimentary,エロ ロボットcongratulation,

  22. ラブドール on September 2, 2025 12:50 pm

    grounded upon certain Texts of Aristotle,えろ 人形teach anotherdoctrine,

  23. ラブドール on September 2, 2025 12:53 pm

    美人 せっくすsmiling,handsome face and moist eyes.

  24. ラブドール on September 2, 2025 2:52 pm

    Providing a woman instructions to self-pleasure is almost always the first step in sex therapy for anorgasmia.Additional steps are trying common finger motions,エロ ラブドール

  25. ラブドール on September 2, 2025 2:55 pm

    who were 34.ラブドール エロ65 years old on average and in relationships for an average of 8 years.

  26. advokat_lken on September 5, 2025 8:52 am

    Сегодня юридические услуги необходимы как никогда. Не всегда просто разобраться в правовой системе без помощи опытного профессионала.
    Сайт konsultaciya-advokata11.ru предоставляет разнообразные юридические услуги.. Здесь вы можете получить помощь по самым различным вопросам.. Получите помощь квалифицированного юриста на [url=https://konsultaciya-advokata11.ru] онлайн консультация юриста по телефону[/url].
    Профессиональные адвокаты готовы помочь вам.. Мы стремимся предоставить клиентам только лучшие юридические решения..
    Обратитесь к нам, и вы не пожалеете о своем выборе.. Каждая консультация осуществляется с учетом ваших потребностей и обстоятельств.

Leave A Reply

Latest Posts

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

Latest Posts

World No. 1! Tencent’s Hunyuan Translation Model Hunyuan-MT-7B Tops Open Source Rankings_model_the_along

September 7, 2025

Nvidia Stock To Fall 50% As AI Cycle Turns?

September 7, 2025

To Help Workers Losing Their Jobs to AI, OpenAI Is Launching a Jobs Platform Run By AI

September 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • World No. 1! Tencent’s Hunyuan Translation Model Hunyuan-MT-7B Tops Open Source Rankings_model_the_along
  • Nvidia Stock To Fall 50% As AI Cycle Turns?
  • To Help Workers Losing Their Jobs to AI, OpenAI Is Launching a Jobs Platform Run By AI
  • Apple’s new generative AI ‘answer engine’ might arrive sooner than we initially thought
  • AI Model Learns to ‘Act Accordingly’, Opening a New Era of Adaptive AI_model_The_this

Recent Comments

  1. RussellCap on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Luxury1288 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. ForereyBen on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. master-shin-707 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. toto macau on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.