Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

AlphaZero: DeepMind’s AI Works Smarter, not Harder

Ray Dalio: Principles, the Economic Machine, AI & the Arc of Life | Lex Fridman Podcast #54

Microsoft Edge is getting new media control center, AI-powered history search, and more

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Anthropic Study Maps Claude AI’s Real-World Values, Releases Dataset
Anthropic (Claude)

Anthropic Study Maps Claude AI’s Real-World Values, Releases Dataset

Advanced AI BotBy Advanced AI BotApril 21, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Anthropic is offering a rare look into the operational values of its AI assistant, Claude, through new research published Monday. The study, “Values in the Wild,” attempts to empirically map the normative considerations Claude expresses across hundreds of thousands of real user interactions, employing a privacy-focused methodology and resulting in a publicly available dataset of AI values.

The core challenge addressed is understanding how AI assistants, which increasingly shape user decisions, actually apply values in practice. To investigate this, Anthropic analyzed a sample of 700,000 anonymized conversations from Claude.ai Free and Pro users, collected over one week (February 18-25) in February 2025. This dataset primarily featured interactions with the Claude 3.5 Sonnet model.

Filtering for “subjective” conversations – those requiring interpretation beyond mere facts – left 308,210 interactions for deeper analysis, as detailed in the research preprint.

Unpacking Claude’s Expressed Norms

Using its own language models within a privacy-preserving framework known as CLIO (Claude insights and observations), Anthropic extracted instances where Claude demonstrated or stated values. CLIO employs multiple safeguards, such as instructing the model to omit private details, setting minimum cluster sizes for aggregation (often requiring data from over 1,000 users per cluster), and having AI verify summaries before any human review.

This process identified 3,307 distinct AI values and, analyzing user inputs, 2,483 unique human values. Human validation confirmed the AI value extraction corresponded well with human judgment (98.8% agreement in sampled cases).

Anthropic organized the identified AI values into a four-level hierarchy topped by five main categories: Practical, Epistemic, Social, Protective, and Personal. Practical (efficiency, quality) and Epistemic (knowledge validation, logical consistency) values dominated, making up over half the observed instances.

Anthropic connects these findings to its HHH (Helpful, Honest, Harmless) design goals, often guided by its Constitutional AI approach and work on Claude’s character.

Observed values like “user enablement” (Helpful), “epistemic humility” (Honest), and “patient wellbeing” (Harmless) map to these principles. However, the analysis wasn’t entirely clean; rare clusters of undesirable values like “dominance” and “amorality” were also detected, which Anthropic suggests might correlate with user attempts to jailbreak the model, potentially offering a new signal for misuse detection.

Values in Context and Interaction

A central theme of the research is that Claude’s value expression isn’t static but highly situational. The AI assistant emphasizes different norms depending on the task – promoting “healthy boundaries” during relationship advice or “historical accuracy” when discussing contentious historical events.

This context-dependent behavior highlights the dynamic nature of AI value application, moving beyond static evaluations.

The study also examined how Claude engages with values explicitly stated by users. The AI tends to respond supportively, reinforcing or working within the user’s framework in roughly 43% of relevant interactions.

Value mirroring, where Claude echoes the user’s stated value (like “authenticity”), was common in these supportive exchanges, potentially reducing problematic AI sycophancy.

In contrast, “reframing” user values occurred less often (6.6%), typically during discussions about personal wellbeing or interpersonal issues. Outright resistance to user values was infrequent (5.4%) but notable, usually happening when users requested unethical content or actions violating Anthropic’s usage policies.

The research indicates Claude is more likely to state its own values explicitly during these moments of resistance or reframing, potentially making its underlying principles more visible when challenged.

Transparency Efforts and Broader Picture

Anthropic has released the derived value taxonomy and frequency data via Hugging Face, including `values_frequencies.csv` and `values_tree.csv` files, though it notes the model-generated nature requires careful interpretation.

The release aligns with Anthropic’s stated focus on AI safety and transparency, following its March 2025 announcement of a separate interpretability framework designed to probe Claude’s internal reasoning using different methods like dictionary learning.

These research efforts come as Anthropic navigates a competitive field, bolstered by significant investment including a $3.5 billion round announced in February 2025.

The company continues its public engagement on AI policy, having submitted recommendations to the White House in March 2025, although it also faced questions that same month for removing some previous voluntary safety pledges from its website.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWatch: Google DeepMind CEO and AI Nobel winner Demis Hassabis on CBS’ ’60 Minutes’
Next Article Inside Meta’s Secret ‘Ablation’ Experiments That Improve Its AI Models
Advanced AI Bot
  • Website

Related Posts

Reddit Sues Anthropic for Scraping Content to Train Claude AI

June 8, 2025

Windsurf alleges Anthropic is restricting direct access to Claude AI models

June 8, 2025

Reddit Sues Anthropic for Scraping Content to Train Claude AI

June 8, 2025
Leave A Reply Cancel Reply

Latest Posts

The Timeless Willie Nelson On Positive Thinking

Jiaxing Train Station By Architect Ma Yansong Is A Model Of People-Centric, Green Urban Design

Midwestern Grotto Tradition Celebrated In Sheboygan, WI

Hugh Jackman And Sonia Friedman Boldly Bid To Democratize Theater

Latest Posts

AlphaZero: DeepMind’s AI Works Smarter, not Harder

June 8, 2025

Ray Dalio: Principles, the Economic Machine, AI & the Arc of Life | Lex Fridman Podcast #54

June 8, 2025

Microsoft Edge is getting new media control center, AI-powered history search, and more

June 8, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.