A framework for evaluating and optimizing natural language prompts in large language models is proposed, revealing correlations between prompt properties and their impact on reasoning tasks.
As large language models (LLMs) have progressed towards more human-like and
human–AI communications have become prevalent, prompting has emerged as a
decisive component. However, there is limited conceptual consensus on what
exactly quantifies natural language prompts. We attempt to address this
question by conducting a meta-analysis surveying more than 150
prompting-related papers from leading NLP and AI conferences from 2022 to 2025
and blogs. We propose a property- and human-centric framework for evaluating
prompt quality, encompassing 21 properties categorized into six dimensions. We
then examine how existing studies assess their impact on LLMs, revealing their
imbalanced support across models and tasks, and substantial research gaps.
Further, we analyze correlations among properties in high-quality natural
language prompts, deriving prompting recommendations. We then empirically
explore multi-property prompt enhancements in reasoning tasks, observing that
single-property enhancements often have the greatest impact. Finally, we
discover that instruction-tuning on property-enhanced prompts can result in
better reasoning models. Our findings establish a foundation for
property-centric prompt evaluation and optimization, bridging the gaps between
human–AI communication and opening new prompting research directions.