Paper Page - Evaluating Text Creativity Across Diverse Domains: A Dataset And Large Language Model Evaluator

A novel pairwise-comparison framework using CreataSet dataset trains CrEval, an LLM-based evaluator that significantly improves the assessment of textual creativity aligned with human judgments.

Creativity evaluation remains a challenging frontier for large language
models (LLMs). Current evaluations heavily rely on inefficient and costly human
judgments, hindering progress in enhancing machine creativity. While automated
methods exist, ranging from psychological testing to heuristic- or
prompting-based approaches, they often lack generalizability or alignment with
human judgment. To address these issues, in this paper, we propose a novel
pairwise-comparison framework for assessing textual creativity, leveraging
shared contextual instructions to improve evaluation consistency. We introduce
CreataSet, a large-scale dataset with 100K+ human-level and 1M+ synthetic
creative instruction-response pairs spanning diverse open-domain tasks. Through
training on CreataSet, we develop an LLM-based evaluator named CrEval. CrEval
demonstrates remarkable superiority over existing methods in alignment with
human judgments. Experimental results underscore the indispensable significance
of integrating both human-generated and synthetic data in training highly
robust evaluators, and showcase the practical utility of CrEval in boosting the
creativity of LLMs. We will release all data, code, and models publicly soon to
support further research.

Source link

What's Hot

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

Bulgarian Doctoral Student Anna-Maria Halacheva Recognized by European Commission – Novinite.com

MIT CSAIL’s drone system embraces uncertainty

Paper page – Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning – Takara TLDR

A Survey of Reinforcement Learning for Large Reasoning Models – Takara TLDR

Long-Lost Painting By Rubens From 1613 Discovered in Paris Mansion

Ken Griffin Loves Pollock’s Blue Poles So Much He Tried to Buy it

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

NeueHouse, a Hot Spot for Art Events, Files for Bankruptcy

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

Bulgarian Doctoral Student Anna-Maria Halacheva Recognized by European Commission – Novinite.com

MIT CSAIL’s drone system embraces uncertainty

What's Hot

Paper page – Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

Related Posts

Subscribe to Updates