arXiv AI

Multilingual Culturally-Aligned Natural Query for LLMs

By Advanced AI EditorJune 3, 2025No Comments2 Mins Read

[Submitted on 13 Jul 2024 (v1), last revised 30 May 2025 (this version, v3)]

View a PDF of the paper titled NativQA: Multilingual Culturally-Aligned Natural Query for LLMs, by Md. Arid Hasan and 8 other authors

View PDF
HTML (experimental)

Abstract:Natural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed and some work has been done in parallel, there is a notable lack of a framework and large scale region-specific datasets queried by native users in their own languages. This gap hinders the effective benchmarking and the development of fine-tuned models for regional and cultural specificities. In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, MultiNativQA, consisting of ~64k manually annotated QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers from 9 regions covering 18 topics. We benchmark open- and closed-source LLMs with the MultiNativQA dataset. We made the MultiNativQA dataset(this https URL), and other experimental scripts(this https URL) publicly available for the community.

Submission history

From: Firoj Alam [view email]
[v1]
Sat, 13 Jul 2024 09:34:00 UTC (4,332 KB)
[v2]
Sun, 6 Oct 2024 10:46:41 UTC (6,266 KB)
[v3]
Fri, 30 May 2025 14:06:34 UTC (2,741 KB)

Previous ArticleStanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

Next Article Create AI videos in Microsoft’s search engine

Advanced AI Editor

Leave A Reply