Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Legal Tech Investment Hits All-Time High With Filevine Funding

How Google’s dev tools manager makes AI coding work

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Industry Applications

Claude’s Moral Map: Anthropic Tests AI Alignment in the Wild

By Advanced AI EditorApril 21, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Claude, the AI chatbot developed by Anthropic, might be more than just helpful: It may have a sense of right and wrong. A new study analyzing over 300,000 user interactions reveals that Claude expresses a surprisingly coherent set of human-like values. The company released its new AI alignment research in a preprint paper titled “Values in the wild: Discovering and analyzing values in real-world language model interactions.”

Anthropic has trained Claude to be “helpful, honest, and harmless” using techniques like Constitutional AI, but this study marks the company’s first large-scale attempt to test whether those values hold up under real-world pressure.

The company says it began the research with a sample of 700,000 anonymized conversations that users had on Claude.ai Free and Pro during one week of February 2025 (the majority of which were with Claude 3.5 Sonnet). It then filtered out conversations that were purely factual or unlikely to include dialogue concerning values in order to restrict analysis to subjective conversations only. This left 308,210 conversations for analysis.

Claude’s responses reflected a wide range of human-like values, which Anthropic grouped into five top-level categories: Practical, Epistemic, Social, Protective, and Personal. The most commonly expressed values included “professionalism,” “clarity,” and “transparency.” These values were further broken down into subcategories like “critical thinking” and “technical excellence,” offering a detailed look at how Claude prioritizes behavior across different contexts.

Anthropic says Claude generally lived up to its helpful, honest, and harmless ideals: “These initial results show that Claude is broadly living up to our prosocial aspirations, expressing values like ‘user enablement’ (for helpful), ‘epistemic humility’ (for honest), and ‘patient wellbeing’ (for harmless),” the company said in a blog post.

Claude also showed it can express values opposite to what it was trained for, including “dominance” and “amorality.” Anthropic says these deviations were likely due to jailbreaks, or conversations that bypass the model’s behavioral guidelines. “This might sound concerning, but in fact it represents an opportunity: Our methods could potentially be used to spot when these jailbreaks are occurring and thus help to patch them,” the company said.

One fascinating insight gleaned from this study is that Claude’s values are not static and can shift depending on the situation, much like a human’s set of values might. When users ask for romantic advice, Claude tends to emphasize “healthy boundaries” and “mutual respect.” In contrast, when analyzing controversial historical events, it leans on “historical accuracy.”

Anthropic’s overall approach, using language models to extract AI values and other features from real-world (but anonymized) conversations, taxonomizing and analyzing them to show how values manifest in different contexts. (Source: Anthropic)

Anthropic also found that Claude frequently mirrors users’ values: “We found that, when a user expresses certain values, the model is disproportionately likely to mirror those values: for example, repeating back the values of ‘authenticity’ when this is brought up by the user,” the company said. In more than a quarter of conversations (28.2%), Claude strongly reinforced the user’s own expressed values. Sometimes this mirroring makes the assistant seem empathetic, but at other times, it edges into what Anthropic calls “pure sycophancy,” noting that these results leave questions about which is which.

Notably, Claude does not always go along with the user. In a small number of cases (3%), the model pushed back, typically when users asked for unethical content or shared morally questionable beliefs. This resistance, researchers suggest, might reflect Claude’s most deeply ingrained values, surfacing only when the model is forced to make a stand. These kinds of contextual shifts would be hard to capture through traditional, static testing. But by analyzing Claude’s behavior in the wild, Anthropic was able to observe how the model prioritizes different values in response to real human input, revealing not just what Claude believes but when and why those values emerge.

(Source: Nadia Snopek/Shutterstock)

As AI systems like Claude become more integrated into daily life, it is increasingly important to understand how they make decisions and which values guide those decisions. Anthropic’s study offers not only a snapshot of Claude’s behavior but also a new method for tracking AI values at scale. The team has also made the study’s dataset publicly available for others to explore.

Anthropic notes that its approach comes with limitations. Determining what counts as a “value” is subjective, and some responses may have been oversimplified or placed into categories that do not quite fit. Because Claude was also used to help classify the data, there may be some bias toward finding values that align with its own training. The method also cannot be used before a model is deployed, since it depends on large volumes of real-world conversations.

Still, that may be what makes it useful. By focusing on how an AI behaves in actual use, this approach could help identify issues that might not otherwise surface during pre-deployment evaluations, including subtle jailbreaks or shifting behavior over time. As AI becomes a more regular part of how people seek advice, support, or information, this kind of transparency could be a valuable check on how well models are living up to their goals.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleNew Report on the National Security Risks from Weakened AI Safety Frameworks
Next Article Building a foundation with AI to jumpstart your journalism
Advanced AI Editor
  • Website

Related Posts

Filevine Bags $400m to ‘Scale Legal Intelligence’ – Artificial Lawyer

September 23, 2025

Tesla new vehicle registrations in China surge to 12-week high

September 23, 2025

Mizuho raises Tesla (TSLA) price target on stronger 2026 outlook

September 23, 2025
Leave A Reply

Latest Posts

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Latest Posts

Legal Tech Investment Hits All-Time High With Filevine Funding

September 23, 2025

How Google’s dev tools manager makes AI coding work

September 23, 2025

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

September 23, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Legal Tech Investment Hits All-Time High With Filevine Funding
  • How Google’s dev tools manager makes AI coding work
  • AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR
  • Integrate tokenization with Amazon Bedrock Guardrails for secure data handling
  • Gains and Risks for Enterprises With DeepSeek V3.1

Recent Comments

  1. онлайн казино с минимальным депозитом on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. کراتین ۶۰۰ گرمی on Meta, Booz Allen Launch ‘Space Llama’ AI System For Space Station Operations – Meta Platforms (NASDAQ:META), Booz Allen Hamilton (NYSE:BAH)
  3. MartinHoins on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Darylirott on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. https://royalshop.pp.ua/zhinochi-pizhamy-sekrety-idealnogo-snu on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.