Fin-PRM: A Domain-Specialized Process Reward Model For Financial Reasoning In Large Language Models - Takara TLDR

Process Reward Models (PRMs) have emerged as a promising framework for
supervising intermediate reasoning in large language models (LLMs), yet
existing PRMs are primarily trained on general or Science, Technology,
Engineering, and Mathematics (STEM) domains and fall short in domain-specific
contexts such as finance, where reasoning is more structured, symbolic, and
sensitive to factual and regulatory correctness. We introduce \textbf{Fin-PRM},
a domain-specialized, trajectory-aware PRM tailored to evaluate intermediate
reasoning steps in financial tasks. Fin-PRM integrates step-level and
trajectory-level reward supervision, enabling fine-grained evaluation of
reasoning traces aligned with financial logic. We apply Fin-PRM in both offline
and online reward learning settings, supporting three key applications: (i)
selecting high-quality reasoning trajectories for distillation-based supervised
fine-tuning, (ii) providing dense process-level rewards for reinforcement
learning, and (iii) guiding reward-informed Best-of-N inference at test time.
Experimental results on financial reasoning benchmarks, including CFLUE and
FinQA, demonstrate that Fin-PRM consistently outperforms general-purpose PRMs
and strong domain baselines in trajectory selection quality. Downstream models
trained with Fin-PRM yield substantial improvements with baselines, with gains
of 12.9\% in supervised learning, 5.2\% in reinforcement learning, and 5.1\% in
test-time performance. These findings highlight the value of domain-specialized
reward modeling for aligning LLMs with expert-level financial reasoning. Our
project resources will be available at https://github.com/aliyun/qwen-dianjin.

Source link

What's Hot

Tesla Will Use a Powerful New Weapon in AI Race

Nvidia Halts H20 AI Chip Production as China Cracks Down on Purchases

A Survey on Large Language Model Benchmarks – Takara TLDR

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models – Takara TLDR

A Survey on Large Language Model Benchmarks – Takara TLDR

When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding – Takara TLDR

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models – Takara TLDR

White House Targets Specific Artworks at Smithsonian Museums

French Art Historian Trying to Block Bayeux Tapestry’s Move to London

Czech Man Sues Christie’s For Information on Nazi-Looted Artworks

Tanya Bonakdar Gallery to Close Los Angeles Space

Tesla Will Use a Powerful New Weapon in AI Race

Nvidia Halts H20 AI Chip Production as China Cracks Down on Purchases

A Survey on Large Language Model Benchmarks – Takara TLDR

What's Hot

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models – Takara TLDR

Related Posts

Subscribe to Updates