MGM-Omni: Scaling Omni LLMs To Personalized Long-Horizon Speech - Takara TLDR

We present MGM-Omni, a unified Omni LLM for omni-modal understanding and
expressive, long-horizon speech generation. Unlike cascaded pipelines that
isolate speech synthesis, MGM-Omni adopts a “brain-mouth” design with a
dual-track, token-based architecture that cleanly decouples multimodal
reasoning from real-time speech generation. This design enables efficient
cross-modal interaction and low-latency, streaming speech generation. For
understanding, a unified training strategy coupled with a dual audio encoder
design enables long-form audio perception across diverse acoustic conditions.
For generation, a chunk-based parallel decoding scheme narrows the text speech
token-rate gap, accelerating inference and supporting streaming zero-shot voice
cloning with stable timbre over extended durations. Compared to concurrent
work, MGM-Omni achieves these capabilities with markedly data-efficient
training. Extensive experiments demonstrate that MGM-Omni outperforms existing
open source models in preserving timbre identity across extended sequences,
producing natural and context-aware speech, and achieving superior long-form
audio and omnimodal understanding. MGM-Omni establishes an efficient,
end-to-end paradigm for omnimodal understanding and controllable, personalised
long-horizon speech generation.

Source link

What's Hot

AMD Expands AI Partnership With Cohere to Power Enterprise and Sovereign AI

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones – Takara TLDR

Anthropic Plans India Expansion, To Hire Country Lead Amid Global Push

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech – Takara TLDR

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones – Takara TLDR

Pretraining Large Language Models with NVFP4 – Takara TLDR

GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts – Takara TLDR

Smithsonian Museums to Remain Open Amid Government Shutdown

Statue Left Behind by Grave Robbers Unearthed in Saqqara, Egypt

Federal Judge Denies Motion to Dismiss by Kasseem ‘Swizz Beatz’ Dean in 1MBD Scandal Case

Picasso Museum in Paris Plans $59 M. Expansion with New Sculpture Park

AMD Expands AI Partnership With Cohere to Power Enterprise and Sovereign AI

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones – Takara TLDR

Anthropic Plans India Expansion, To Hire Country Lead Amid Global Push

What's Hot

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech – Takara TLDR

Related Posts

Subscribe to Updates