Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

DOGE has built an AI tool to slash federal regulations

Who is Lamini Fati, the teenaged Leganés defender set to sign for Real Madrid?

‘It’s how we use this for learning.’ Lenox and Lee schools partner with MIT to prepare students for the AI revolution | Central Berkshires

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Industry Applications

Claude’s Moral Map: Anthropic Tests AI Alignment in the Wild

By Advanced AI EditorApril 21, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Claude, the AI chatbot developed by Anthropic, might be more than just helpful: It may have a sense of right and wrong. A new study analyzing over 300,000 user interactions reveals that Claude expresses a surprisingly coherent set of human-like values. The company released its new AI alignment research in a preprint paper titled “Values in the wild: Discovering and analyzing values in real-world language model interactions.”

Anthropic has trained Claude to be “helpful, honest, and harmless” using techniques like Constitutional AI, but this study marks the company’s first large-scale attempt to test whether those values hold up under real-world pressure.

The company says it began the research with a sample of 700,000 anonymized conversations that users had on Claude.ai Free and Pro during one week of February 2025 (the majority of which were with Claude 3.5 Sonnet). It then filtered out conversations that were purely factual or unlikely to include dialogue concerning values in order to restrict analysis to subjective conversations only. This left 308,210 conversations for analysis.

Claude’s responses reflected a wide range of human-like values, which Anthropic grouped into five top-level categories: Practical, Epistemic, Social, Protective, and Personal. The most commonly expressed values included “professionalism,” “clarity,” and “transparency.” These values were further broken down into subcategories like “critical thinking” and “technical excellence,” offering a detailed look at how Claude prioritizes behavior across different contexts.

Anthropic says Claude generally lived up to its helpful, honest, and harmless ideals: “These initial results show that Claude is broadly living up to our prosocial aspirations, expressing values like ‘user enablement’ (for helpful), ‘epistemic humility’ (for honest), and ‘patient wellbeing’ (for harmless),” the company said in a blog post.

Claude also showed it can express values opposite to what it was trained for, including “dominance” and “amorality.” Anthropic says these deviations were likely due to jailbreaks, or conversations that bypass the model’s behavioral guidelines. “This might sound concerning, but in fact it represents an opportunity: Our methods could potentially be used to spot when these jailbreaks are occurring and thus help to patch them,” the company said.

One fascinating insight gleaned from this study is that Claude’s values are not static and can shift depending on the situation, much like a human’s set of values might. When users ask for romantic advice, Claude tends to emphasize “healthy boundaries” and “mutual respect.” In contrast, when analyzing controversial historical events, it leans on “historical accuracy.”

Anthropic’s overall approach, using language models to extract AI values and other features from real-world (but anonymized) conversations, taxonomizing and analyzing them to show how values manifest in different contexts. (Source: Anthropic)

Anthropic also found that Claude frequently mirrors users’ values: “We found that, when a user expresses certain values, the model is disproportionately likely to mirror those values: for example, repeating back the values of ‘authenticity’ when this is brought up by the user,” the company said. In more than a quarter of conversations (28.2%), Claude strongly reinforced the user’s own expressed values. Sometimes this mirroring makes the assistant seem empathetic, but at other times, it edges into what Anthropic calls “pure sycophancy,” noting that these results leave questions about which is which.

Notably, Claude does not always go along with the user. In a small number of cases (3%), the model pushed back, typically when users asked for unethical content or shared morally questionable beliefs. This resistance, researchers suggest, might reflect Claude’s most deeply ingrained values, surfacing only when the model is forced to make a stand. These kinds of contextual shifts would be hard to capture through traditional, static testing. But by analyzing Claude’s behavior in the wild, Anthropic was able to observe how the model prioritizes different values in response to real human input, revealing not just what Claude believes but when and why those values emerge.

(Source: Nadia Snopek/Shutterstock)

As AI systems like Claude become more integrated into daily life, it is increasingly important to understand how they make decisions and which values guide those decisions. Anthropic’s study offers not only a snapshot of Claude’s behavior but also a new method for tracking AI values at scale. The team has also made the study’s dataset publicly available for others to explore.

Anthropic notes that its approach comes with limitations. Determining what counts as a “value” is subjective, and some responses may have been oversimplified or placed into categories that do not quite fit. Because Claude was also used to help classify the data, there may be some bias toward finding values that align with its own training. The method also cannot be used before a model is deployed, since it depends on large volumes of real-world conversations.

Still, that may be what makes it useful. By focusing on how an AI behaves in actual use, this approach could help identify issues that might not otherwise surface during pre-deployment evaluations, including subtle jailbreaks or shifting behavior over time. As AI becomes a more regular part of how people seek advice, support, or information, this kind of transparency could be a valuable check on how well models are living up to their goals.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleNew Report on the National Security Risks from Weakened AI Safety Frameworks
Next Article Building a foundation with AI to jumpstart your journalism
Advanced AI Editor
  • Website

Related Posts

I sat in on an AI training session at KPMG. It was almost like being back at journalism school.

July 26, 2025

xAI, Musk Foundation helps schools near Memphis supercomputer site

July 26, 2025

What to expect from Tesla CEO Elon Musk’s new Master Plan

July 26, 2025
Leave A Reply

Latest Posts

David Geffen Sued By Estranged Husband for Breach of Contract

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Street Fighter 6 Community Rocked by AI Art Controversy

Latest Posts

DOGE has built an AI tool to slash federal regulations

July 27, 2025

Who is Lamini Fati, the teenaged Leganés defender set to sign for Real Madrid?

July 27, 2025

‘It’s how we use this for learning.’ Lenox and Lee schools partner with MIT to prepare students for the AI revolution | Central Berkshires

July 27, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • DOGE has built an AI tool to slash federal regulations
  • Who is Lamini Fati, the teenaged Leganés defender set to sign for Real Madrid?
  • ‘It’s how we use this for learning.’ Lenox and Lee schools partner with MIT to prepare students for the AI revolution | Central Berkshires
  • This AI Learns Faster Than Anything We’ve Seen!
  • ByteDance’s Doubao: China’s answer to GPT-4o is 50x cheaper and ready for action: Details – Technology News

Recent Comments

  1. binance sign up on Inclusion Strategies in Workplace | Recruiting News Network
  2. Rejestracja on Online Education – How I Make My Videos
  3. Anonymous on AI, CEOs, and the Wild West of Streaming
  4. MichaelWinty on Local gov’t reps say they look forward to working with Thomas
  5. 4rabet mirror on Former Tesla AI czar Andrej Karpathy coins ‘vibe coding’: Here’s what it means

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.