LLaSO: A Foundational Framework For Reproducible Research In Large Language And Speech Model - Takara TLDR

The development of Large Speech-Language Models (LSLMs) has been slowed by
fragmented architectures and a lack of transparency, hindering the systematic
comparison and reproducibility of research. Unlike in the vision-language
domain, the LSLM field suffers from the common practice of releasing model
weights without their corresponding training data and configurations. To
address these critical gaps, we introduce LLaSO, the first fully open,
end-to-end framework for large-scale speech-language modeling. LLaSO provides
the community with three essential resources: (1) LLaSO-Align, a 12M-instance
speech-text alignment corpus; (2) LLaSO-Instruct, a 13.5M-instance multi-task
instruction-tuning dataset; and (3) LLaSO-Eval, a reproducible benchmark for
standardized evaluation. To validate our framework, we build and release
LLaSO-Base, a 3.8B-parameter reference model trained exclusively on our public
data. It achieves a normalized score of 0.72, establishing a strong,
reproducible baseline that surpasses comparable models. Our analysis reveals
that while broader training coverage enhances performance, significant
generalization gaps persist on unseen tasks, particularly in pure audio
scenarios. By releasing the complete stack of data, benchmarks, and models,
LLaSO establishes a foundational open standard to unify research efforts and
accelerate community-driven progress in LSLMs. We release the code, dataset,
pretrained models, and results in https://github.com/EIT-NLP/LLaSO.

Source link

What's Hot

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks – Takara TLDR

Assessing Valuation After NVIDIA AI Partnership and Manufacturing Expansion

The launch of IBM Quantum System Two is Europe’s quantum moment

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model – Takara TLDR

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks – Takara TLDR

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training – Takara TLDR

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution – Takara TLDR

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks – Takara TLDR

Assessing Valuation After NVIDIA AI Partnership and Manufacturing Expansion

The launch of IBM Quantum System Two is Europe’s quantum moment

What's Hot

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model – Takara TLDR

Related Posts

Subscribe to Updates