Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Elon Musk: Understanding the Human Brain at Neuralink

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

Foundation AI Appoints Tom Majchrowski as Chief Technology Officer

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more
VentureBeat AI

A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more

Advanced AI BotBy Advanced AI BotApril 22, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

A two-person startup by the name of Nari Labs has introduced Dia, a 1.6 billion parameter text-to-speech (TTS) model designed to produce naturalistic dialogue directly from text prompts — and one of its creators claims it surpasses the performance of competing proprietary offerings from the likes of ElevenLabs, Google’s hit NotebookLM AI podcast generation product.

It could also threaten uptake of OpenAI’s recent gpt-4o-mini-tts.

“Dia rivals NotebookLM’s podcast feature while surpassing ElevenLabs Studio and Sesame’s open model in quality,” said Toby Kim, one of the co-creators of Nari and Dia, on a post from his account on the social network X.

In a separate post, Kim noted that the model was built with “zero funding,” and added across a thread: “…we were not AI experts from the beginning. It all started when we fell in love with NotebookLM’s podcast feature when it was released last year. We wanted more—more control over the voices, more freedom in the script. We tried every TTS API on the market. None of them sounded like real human conversation.”

Kim further credited Google for giving him and his collaborator access to the company’s Tensor Processing Unit chips (TPUs) for training Dia through Google’s Research Cloud.

Dia’s code and weights — the internal model connection set — is now available for download and local deployment by anyone from Hugging Face or Github. Individual users can try generating speech from it on a Hugging Face Space.

Advanced controls and more customizable features

Dia supports nuanced features like emotional tone, speaker tagging, and nonverbal audio cues—all from plain text.

Users can mark speaker turns with tags like [S1] and [S2], and include cues like (laughs), (coughs), or (clears throat) to enrich the resulting dialogue with nonverbal behaviors.

These tags are correctly interpreted by Dia during generation—something not reliably supported by other available models, according to the company’s examples page.

The model is currently English-only and not tied to any single speaker’s voice, producing different voices per run unless users fix the generation seed or provide an audio prompt. Audio conditioning, or voice cloning, lets users guide speech tone and voice likeness by uploading a sample clip.

Nari Labs offers example code to facilitate this process and a Gradio-based demo so users can try it without setup.

Comparison with ElevenLabs and Sesame

Nari offers a host of example audio files generated by Dia on its Notion website, comparing it to other leading speech-to-text rivals, specifically ElevenLabs Studio and Sesame CSM-1B, the latter a new text-to-speech model from Oculus VR headset co-creator Brendan Iribe that went somewhat viral on X earlier this year.

Side-by-side examples shared by Nari Labs show how Dia outperforms the competition in several areas:

In standard dialogue scenarios, Dia handles both natural timing and nonverbal expressions better. For example, in a script ending with (laughs), Dia interprets and delivers actual laughter, whereas ElevenLabs and Sesame output textual substitutions like “haha”.

For example, here’s Dia…

…and the same sentence spoken by ElevenLabs Studio

In multi-turn conversations with emotional range, Dia demonstrates smoother transitions and tone shifts. One test included a dramatic, emotionally-charged emergency scene. Dia rendered the urgency and speaker stress effectively, while competing models often flattened delivery or lost pacing.

Dia uniquely handles nonverbal-only scripts, such as a humorous exchange involving coughs, sniffs, and laughs. Competing models failed to recognize these tags or skipped them entirely.

Even with rhythmically complex content like rap lyrics, Dia generates fluid, performance-style speech that maintains tempo. This contrasts with more monotone or disjointed outputs from ElevenLabs and Sesame’s 1B model.

Using audio prompts, Dia can extend or continue a speaker’s voice style into new lines. An example using a conversational clip as a seed showed how Dia carried vocal traits from the sample through the rest of the scripted dialogue. This feature isn’t robustly supported in other models.

In one set of tests, Nari Labs noted that Sesame’s best website demo likely used an internal 8B version of the model rather than the public 1B checkpoint, resulting in a gap between advertised and actual performance.

Model access and tech specs

Developers can access Dia from Nari Labs’ GitHub repository and its Hugging Face model page.

The model runs on PyTorch 2.0+ and CUDA 12.6 and requires about 10GB of VRAM.

Inference on enterprise-grade GPUs like the NVIDIA A4000 delivers roughly 40 tokens per second.

While the current version only runs on GPU, Nari plans to offer CPU support and a quantized release to improve accessibility.

The startup offers both a Python library and CLI tool to further streamline deployment.

Dia’s flexibility opens use cases from content creation to assistive technologies and synthetic voiceovers.

Nari Labs is also developing a consumer version of Dia aimed at casual users looking to remix or share generated conversations. Interested users can sing up via email to a waitlist for early access.

Fully open source

The model is distributed under a fully open source Apache 2.0 license, which means it can be used for commercial purposes — something that will obviously appeal to enterprises or indie app developers.

Nari Labs explicitly prohibits usage that includes impersonating individuals, spreading misinformation, or engaging in illegal activities. The team encourages responsible experimentation and has taken a stance against unethical deployment.

Dia’s development credits support from the Google TPU Research Cloud, Hugging Face’s ZeroGPU grant program, and prior work on SoundStorm, Parakeet, and Descript Audio Codec.

Nari Labs itself comprises just two engineers—one full-time and one part-time—but they actively invite community contributions through its Discord server and GitHub.

With a clear focus on expressive quality, reproducibility, and open access, Dia adds a distinctive new voice to the landscape of generative speech models.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTwo undergrads built an AI speech model to rival NotebookLM
Next Article Manychat Raises Massive $140M Round To Push Chat Automation
Advanced AI Bot
  • Website

Related Posts

OpenAI announces 80% price drop for o3, it’s most powerful reasoning model

June 10, 2025

Zencoder just launched an AI that can replace days of QA work in two hours

June 10, 2025

Vanta’s AI agent wants to run your compliance program — and it just might

June 10, 2025
Leave A Reply Cancel Reply

Latest Posts

Smithsonian Stands Up to Trump

France Hikes Museum Ticket Prices for Non-European Tourists

Campaigners Call for Schools to Boycott Science Museum Over Funding

Highlights Of Arts And Music Festivals

Latest Posts

Elon Musk: Understanding the Human Brain at Neuralink

June 10, 2025

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

June 10, 2025

Foundation AI Appoints Tom Majchrowski as Chief Technology Officer

June 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.