Recent advancements in machine learning have spurred growing interests in
automated interpreting quality assessment. Nevertheless, existing research
suffers from insufficient examination of language use quality, unsatisfactory
modeling effectiveness due to data scarcity and imbalance, and a lack of
efforts to explain model predictions. To address these gaps, we propose a
multi-dimensional modeling framework that integrates feature engineering, data
augmentation, and explainable machine learning. This approach prioritizes
explainability over “black box” predictions by utilizing only
construct-relevant, transparent features and conducting Shapley Value (SHAP)
analysis. Our results demonstrate strong predictive performance on a novel
English-Chinese consecutive interpreting dataset, identifying BLEURT and
CometKiwi scores to be the strongest predictive features for fidelity,
pause-related features for fluency, and Chinese-specific phraseological
diversity metrics for language use. Overall, by placing particular emphasis on
explainability, we present a scalable, reliable, and transparent alternative to
traditional human evaluation, facilitating the provision of detailed diagnostic
feedback for learners and supporting self-regulated learning advantages not
afforded by automated scores in isolation.