Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

SecurityPal uses AI, experts in Nepal to answer security qs faster

Trump’s ‘anti-woke AI’ order could reshape how US tech companies train their models

C3.ai’s stock falls as revenue growth beats expectations but slows down

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Hugging Face

Paper page – Multi-Domain Explainability of Preferences

By Advanced AI EditorMay 30, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


(based on a thread on Twitter)

Preferences drive modern LLM research and development: from model alignment to evaluation.
But how well do we understand them?

Excited to share our new preprint:
Multi-domain Explainability of Preferences

We propose a fully automated method for explaining the preferences of three mechanism types:
👥 Human preferences (used to train reward models and evaluation)
🤖 LLM-as-a-Judge (de facto standard for automatic evaluation)
🏅 Reward models (used in RLHF/RLAIF for alignment)

Our four-stage method:

Use LLM to discover concepts that distinguish between chosen and rejected responses.
Represent responses as concept vectors.
Train a logistic regression model to predict preferences.
Extract concept importance from model weights.

Our special focus is on multi-domain learning:
Concepts affect preference decisions differently across domains.
A concept that is important in one domain may be irrelevant in another.

To address this, we introduce a white-box Hierarchical Multi-Domain Regression (HMDR) model:

The HMDR model is optimized to:
• Make shared weights strongly predictive → improves OOD generalization.
• Encourage sparsity (L1 regularization) → simpler explanations.

Finally, concept importance is the lift in probability (% change when increasing a concept by one unit)

The resulting explanations are quite interesting 🤩
Below is an example of human preferences across five domains 💬🧑‍💻👩‍⚖️🧑‍🍳🧳

How to read it?
◻️Light bars show the shared contribution to the score,
◼️while dark bars and arrows indicate domain-specific contributions.

How do we know our explanations are good? 🤔
✅ Human Evaluation: LLM concept annotations closely match human annotations.
✅ Preference Prediction: Our method is comparable to human preference models.
The HMDR model outperforms other white-box models both in-domain & OOD.

We assess explanations in two application-driven settings:

Can we “hack” the judge? 👩‍⚖️🤖
Using LLM-as-a-judge explanations, we guide another LLM’s responses (by asking it to follow the top concepts).
Result: Judges prefer the explanation-guided outputs over regular prompts.

Breaking Ties in LLM-as-Judges 🤝
LLMs often produce inconsistent preferences when the order of responses is flipped (10–30% of the time!).

We guide LLM judges using top human-derived concepts to break ties.
Result: Clear gains in human preference alignment on tied cases.

Finally, we analyze our explanations by comparing our findings (auto-discovered concepts) with those from prior studies of manually curated concepts.

🔍 We reproduced many!
Humans prioritize clarity, authority, and confidence, while LLMs emphasize accuracy and helpfulness.

Importantly, we found that domain-specific concepts dominate many preference mechanisms.

Our two key contributions:
1⃣ Automatic concept discovery
2⃣ Multi-domain modeling
Together, they provide a scalable and generalizable approach to modeling NLP preferences.

https://arxiv.org/abs/2505.20088



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHow ZURU improved the accuracy of floor plan generation by 109% using Amazon Bedrock and Amazon SageMaker
Next Article Akool Live Camera Leads the AI Video Generation Wave
Advanced AI Editor
  • Website

Related Posts

Paper page – HOComp: Interaction-Aware Human-Object Composition

July 23, 2025

Paper page – Does More Inference-Time Compute Really Help Robustness?

July 23, 2025

Paper page – RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

July 23, 2025
Leave A Reply

Latest Posts

Winston Artory Merger Targets $15B Art Valuation Market

Denver Museum Discovers 67.5 Million-Year-Old Fossil Under Parking Lot

Taipei Dangdai Cancels 2026 Edition

Andres Serrano Pitches Trump Mausoleum at the Venice Biennale

Latest Posts

SecurityPal uses AI, experts in Nepal to answer security qs faster

July 24, 2025

Trump’s ‘anti-woke AI’ order could reshape how US tech companies train their models

July 24, 2025

C3.ai’s stock falls as revenue growth beats expectations but slows down

July 24, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • SecurityPal uses AI, experts in Nepal to answer security qs faster
  • Trump’s ‘anti-woke AI’ order could reshape how US tech companies train their models
  • C3.ai’s stock falls as revenue growth beats expectations but slows down
  • Qwen3-Coder-480B-A35B-Instruct launches and it ‘might be the best coding model yet’
  • OpenAI’s New Exec Has a Grand Plan to Make AI for Everyone

Recent Comments

  1. 打开Binance账户 on IBM Shares Lessons from Its SAP Cloud Migration at Sapphire Conference
  2. 1win app download on Former Tesla AI czar Andrej Karpathy coins ‘vibe coding’: Here’s what it means
  3. 📃 ✉️ Pending Deposit: 1.8 BTC from new sender. Review? > https://graph.org/REDEEM-BTC-07-23?hs=60194a6753699dfb5804798d5843ffd0& 📃 on This Neural Network Optimizes Itself | Two Minute Papers #212
  4. 📉 📩 Pending Deposit - 1.0 BTC from unknown sender. Review? => https://graph.org/REDEEM-BTC-07-23?hs=16ed4f83e039fc01f975372e66ec05d7& 📉 on OpenAI seeks to make its upcoming ‘open’ AI model best-in-class
  5. 📊 📩 Pending Transfer: 1.8 BTC from unknown sender. Approve? >> https://graph.org/REDEEM-BTC-07-23?hs=8f64f5846f6d90e5a1ebb4bba272bbea& 📊 on Nvidia’s GB200 NVL72 Supercomputer Achieves 2.7× Faster Inference on DeepSeek V2

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.