Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Differentiable Rendering is Amazing!

Jim Keller: Elon Musk and Tesla Autopilot | AI Podcast Clips

Productivity Commission targets AI, renewables to lift economy

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Paper page – Multi-Domain Explainability of Preferences
Hugging Face

Paper page – Multi-Domain Explainability of Preferences

Advanced AI BotBy Advanced AI BotMay 30, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


(based on a thread on Twitter)

Preferences drive modern LLM research and development: from model alignment to evaluation.
But how well do we understand them?

Excited to share our new preprint:
Multi-domain Explainability of Preferences

We propose a fully automated method for explaining the preferences of three mechanism types:
👥 Human preferences (used to train reward models and evaluation)
🤖 LLM-as-a-Judge (de facto standard for automatic evaluation)
🏅 Reward models (used in RLHF/RLAIF for alignment)

Our four-stage method:

Use LLM to discover concepts that distinguish between chosen and rejected responses.
Represent responses as concept vectors.
Train a logistic regression model to predict preferences.
Extract concept importance from model weights.

Our special focus is on multi-domain learning:
Concepts affect preference decisions differently across domains.
A concept that is important in one domain may be irrelevant in another.

To address this, we introduce a white-box Hierarchical Multi-Domain Regression (HMDR) model:

The HMDR model is optimized to:
• Make shared weights strongly predictive → improves OOD generalization.
• Encourage sparsity (L1 regularization) → simpler explanations.

Finally, concept importance is the lift in probability (% change when increasing a concept by one unit)

The resulting explanations are quite interesting 🤩
Below is an example of human preferences across five domains 💬🧑‍💻👩‍⚖️🧑‍🍳🧳

How to read it?
◻️Light bars show the shared contribution to the score,
◼️while dark bars and arrows indicate domain-specific contributions.

How do we know our explanations are good? 🤔
✅ Human Evaluation: LLM concept annotations closely match human annotations.
✅ Preference Prediction: Our method is comparable to human preference models.
The HMDR model outperforms other white-box models both in-domain & OOD.

We assess explanations in two application-driven settings:

Can we “hack” the judge? 👩‍⚖️🤖
Using LLM-as-a-judge explanations, we guide another LLM’s responses (by asking it to follow the top concepts).
Result: Judges prefer the explanation-guided outputs over regular prompts.

Breaking Ties in LLM-as-Judges 🤝
LLMs often produce inconsistent preferences when the order of responses is flipped (10–30% of the time!).

We guide LLM judges using top human-derived concepts to break ties.
Result: Clear gains in human preference alignment on tied cases.

Finally, we analyze our explanations by comparing our findings (auto-discovered concepts) with those from prior studies of manually curated concepts.

🔍 We reproduced many!
Humans prioritize clarity, authority, and confidence, while LLMs emphasize accuracy and helpfulness.

Importantly, we found that domain-specific concepts dominate many preference mechanisms.

Our two key contributions:
1⃣ Automatic concept discovery
2⃣ Multi-domain modeling
Together, they provide a scalable and generalizable approach to modeling NLP preferences.

https://arxiv.org/abs/2505.20088



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHow ZURU improved the accuracy of floor plan generation by 109% using Amazon Bedrock and Amazon SageMaker
Next Article Akool Live Camera Leads the AI Video Generation Wave
Advanced AI Bot
  • Website

Related Posts

Paper page – PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

June 1, 2025

Paper page – Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

May 31, 2025

Paper page – MAGREF: Masked Guidance for Any-Reference Video Generation

May 31, 2025
Leave A Reply Cancel Reply

Latest Posts

Bodytraffic At Avalon Hollywood June 5

Paley Museum In NY Celebrates Six-Season Run Of ‘The Handmaid’s Tale’

Tessa Hulls On The Weight Of History, The Power Of Comics, And Winning A Pulitzer Prize

New Las Vegas Exhibit Displays Five Cirque Du Soleil Shows’ Costumes

Latest Posts

Differentiable Rendering is Amazing!

June 1, 2025

Jim Keller: Elon Musk and Tesla Autopilot | AI Podcast Clips

June 1, 2025

Productivity Commission targets AI, renewables to lift economy

June 1, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.