[2502.06876] Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

[Submitted on 8 Feb 2025 (v1), last revised 16 May 2025 (this version, v3)]

Authors:Jinluan Yang, Dingnan Jin, Anke Tang, Li Shen, Didi Zhu, Zhengyu Chen, Ziyu Zhao, Daixin Wang, Qing Cui, Zhiqiang Zhang, Jun Zhou, Fei Wu, Kun Kuang

View a PDF of the paper titled Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging, by Jinluan Yang and 12 other authors

View PDF
HTML (experimental)

Abstract:Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models’ parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of data mixture (\textit{data-level}) and model merging (\textit{parameter-level}) methods in mitigating the conflict for balanced 3H optimization. Specially, we propose a novel \textbf{R}eweighting \textbf{E}nhanced task \textbf{S}ingular \textbf{M}erging method, \textbf{RESM}, through outlier weighting and sparsity-aware rank selection strategies to address the challenges of preference noise accumulation and layer sparsity adaptation inherent in 3H-aligned LLM merging. Extensive evaluations can verify the effectiveness and robustness of RESM compared to previous data mixture (2\%-5\% gain) and model merging (1\%-3\% gain) methods in achieving balanced LLM alignment. We release our models through \href{this https URL}{3H\_Merging} for further investigations.

Submission history

From: Jinluan Yang [view email]
[v1]
Sat, 8 Feb 2025 11:56:58 UTC (1,177 KB)
[v2]
Thu, 13 Feb 2025 06:28:33 UTC (1,177 KB)
[v3]
Fri, 16 May 2025 05:35:39 UTC (1,940 KB)

Source link

What's Hot

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Two AIs Ace Math Olympiad

DeepMind and OpenAI claim gold in International Mathematical Olympiad

[2502.06876] Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Morning Links for July 22, 2025

Ronald Perelman’s $410 Million Art Insurance Trial Begins over Fire-Damaged Works

Artists Call for Reinstatement of Ousted Whitney ISP Leader

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Two AIs Ace Math Olympiad

DeepMind and OpenAI claim gold in International Mathematical Olympiad

What's Hot

[2502.06876] Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Submission history

Related Posts

Subscribe to Updates