Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

‘This is a huge milestone’

Why Deep Tech In Physical Production Is Venture’s Next Outlier Opportunity

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Voice AI that actually converts: New TTS model boosts sales 15% for major brands

By Advanced AI EditorJune 6, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Generating voices that are not only humanlike and nuanced but diverse continues to be a struggle in conversational AI. 

At the end of the day, people want to hear voices that sound like them or are at least natural, not just the 20th-century American broadcast standard. 

Startup Rime is tackling this challenge with Arcana text-to-speech (TTS), a new spoken language model that can quickly generate “infinite” new voices of varying genders, ages, demographics and languages just based on a simple text description of intended characteristics. 

The model has helped boost customer sales — for the likes of Domino’s and Wingstop — by 15%. 

“It’s one thing to have a really high-quality, life-like, real person-sounding model,” Lily Clifford, Rime CEO and co-founder, told VentureBeat. “It’s another to have a model that can not just create one voice, but infinite variability of voices along demographic lines.”

A voice model that ‘acts human’ 

Rime’s multimodal and autoregressive TTS model was trained on natural conversations with real people (as opposed to voice actors). Users simply type in a text prompt description of a voice with desired demographic characteristics and language. 

For instance: ‘I want a 30 year old female who lives in California and is into software,’ or ‘Give me an Australian man’s voice.’ 

“Every time you do that, you’re going to get a different voice,” said Clifford. 

Rime’s Mist v2 TTS model was built for high-volume, business-critical applications, allowing enterprises to craft unique voices for their business needs. “The customer hears a voice that allows for a natural, dynamic conversation without needing a human agent,” said Clifford. 

For those looking for out-of-the-box options, meanwhile, Rime offers eight flagship speakers with unique characteristics: 

Luna (female, chill but excitable, Gen-Z optimist)

Celeste (female, warm, laid-back, fun-loving)

Orion (male, older, African-American, happy)

Ursa (male, 20 years old, encyclopedic knowledge of 2000s emo music)

Astra (female, young, wide-eyed)

Esther (female, older, Chinese American, loving)

Estelle (female, middle-aged, African-American, sounds so sweet)

Andromeda (female, young, breathy, yoga vibes)

The model has the ability to switch between languages, and can whisper, be sarcastic and even mocking. Arcana can also insert laughter into speech when given the token . This can return varied, realistic outputs, from “a small chuckle to a big guffaw,” Rime says. The model can also interpret , and even correctly, even though it wasn’t explicitly trained to do so. 

“It infers emotion from context,” Rime writes in a technical paper. “It laughs, sighs, hums, audibly breathes and makes subtle mouth noises. It says ‘um’ and other disfluencies naturally. It has emergent behaviors we are still discovering. In short, it acts human.” 

Capturing natural conversations

Rime’s model generates audio tokens that are decoded into speech using a codec-based approach, which Rime says provides for “faster-than-real-time synthesis.” At launch, time to first audio was 250 milliseconds and public cloud latency was roughly 400 milliseconds. 

Arcana was trained in three stages:

Pre-training: Rime used open-source large language models (LLMs) as a backbone and pre-trained on a large group of text-audio pairs to help Arcana learn general linguistic and acoustic patterns.

Supervised fine-tuning with a “massive” proprietary dataset. 

Speaker-specific fine-tuning: Rime identified the speakers it found “most exemplary” among its dataset, conversations and reliability. 

Rime’s data incorporates sociolinguistic conversation techniques (factoring in social context like class, gender, location), idiolect (individual speech habits) and paralinguistic nuances (non-verbal aspects of communication that go along with speech). 

 The model was also trained on accent subtleties, filler words (those subconscious ‘uhs’ and ‘ums’) as well as pauses, prosodic stress patterns (intonation, timing, stressing of certain syllables) and multilingual code-switching (when multilingual speakers switch back and forth between languages). 

The company has taken a unique approach to collecting all this data. Clifford explained that, typically, model builders will gather snippets from voice actors, then create a model to reproduce the characteristics of that person’s voice based on text input. Or, they’ll scrape audiobook data. 

“Our approach was very different,” she explained. “It was, ‘How do we create the world’s largest proprietary data set of conversational speech?’” 

To do so, Rime built its own recording studio in a basement in San Francisco and spent several months recruiting people off Craigslist, through word-of-mouth, or just causally gathered themselves and friends and family. Rather than scripted conversations, they recorded natural conversations and chitchat. 

They then annotated voices with detailed metadata, encoding gender, age, dialect, speech affect and language. This has allowed Rime to achieve 98 to 100% accuracy. 

Clifford noted that they are constantly augmenting this dataset. 

“How do we get it to sound personal? You’re never going to get there if you’re just using voice actors,” she said. “We did the insanely hard thing of collecting really naturalistic data. The huge secret sauce of Rime is that these aren’t actors. These are real people.”

A ‘personalization harness’ that creates bespoke voices

Rime intends to give customers the ability to find voices that will work best for their application. They built a “personalization harness” tool to allow users to do A/B testing with various voices. After a given interaction, the API reports back to Rime, which provides an analytics dashboard identifying the best-performing voices based on success metrics. 

Of course, customers have different definitions of what constitutes a successful call. In food service, that might be upselling an order of fries or extra wings. 

“The goal for us is how do we create an application that makes it easy for our customers to run those experiments themselves?,” said Clifford. “Because our customers aren’t voice casting directors, neither are we. The challenge becomes how to make that personalization analytics layer really intuitive.”

Another KPI customers are maximizing for is the caller’s willingness to talk to the AI. They’ve found that, when switching to Rime, callers are 4X more likely to talk to the bot. 

“For the first time ever, people are like, ‘No, you don’t need to transfer me. I’m perfectly willing to talk to you,’” said Clifford. “Or, when they’re transferred, they say ‘Thank you.’” (20%, in fact, are cordial when ending conversations with a bot). 

Powering 100 million calls a month

Rime counts among its customers Domino’s, Wingstop, Converse Now and Ylopo. They do a lot of work with large contact centers, enterprise developers building interactive voice response (IVR) systems and telecoms, Clifford noted.  

“When we switched to Rime we saw an immediate double-digit improvement in the likelihood of our calls succeeding,” said Akshay Kayastha, director of engineering at ConverseNow. “Working with Rime means we solve a ton of the last-mile problems that come up in shipping a high-impact application.” 

Ylopo CPO Ge Juefeng noted that, for his company’s high-volume outbound application, they need to build immediate trust with the consumer. “We tested every model on the market and found that Rime’s voices converted customers at the highest rate,” he reported. 

Rime is already helping power close to 100 million phone calls a month, said Clifford. “If you call Domino’s or Wingstop, there’s an 80 to 90% chance that you hear a Rime voice,” she said. 

Looking ahead, Rime will push more into on-premises offerings to support low latency. In fact, they anticipate that, by the end of 2025, 90% of their volume will be on-prem. “The reason for that is you’re never going to be as fast if you’re running these models in the cloud,” said Clifford. 

Also, Rime continues to fine-tune its models to address other linguistic challenges. For instance, phrases the model has never encountered, like Domino’s tongue-tying “Meatza ExtravaganZZa.” As Clifford noted, even if a voice is personalized, natural and responds in real time, it’s going to fail if it can’t handle a company’s unique needs. 

“There are still a lot of problems that our competitors see as last-mile problems, but that our customers see as first-mile problems,” said Clifford. 

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAI Learns Tracking People In Videos
Next Article You Can Now Use OpenAI’s Video Generator Sora on Microsoft Bing
Advanced AI Editor
  • Website

Related Posts

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks

July 29, 2025

Anthropic throttles Claude rate limits, devs call foul

July 29, 2025

No more links, no more scrolling—The browser is becoming an AI Agent

July 29, 2025
Leave A Reply

Latest Posts

Betye Saar Assembles an All-Star Group to Steward Her Legacy

Picasso’s ‘Demoiselles’ May Not Have Been Inspired by African Art

Catalan National Assembly protested the restitution of murals to Aragon.

UNESCO Adds 26 Sites to World Heritage List

Latest Posts

‘This is a huge milestone’

July 29, 2025

Why Deep Tech In Physical Production Is Venture’s Next Outlier Opportunity

July 29, 2025

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks

July 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • ‘This is a huge milestone’
  • Why Deep Tech In Physical Production Is Venture’s Next Outlier Opportunity
  • Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks
  • Why the people building the future of AI systems are heading to Toronto for KDD-2025
  • China’s Zhipu Debuts GLM-4.5, Outperforming Rivals With Leaner and Faster AI Model_model_one

Recent Comments

  1. binance kód on Anthropic closes $2.5 billion credit facility as Wall Street continues plunging money into AI boom – NBC Los Angeles
  2. 🖨 🔵 Incoming Message: 1.95 Bitcoin from exchange. Claim transfer => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=40f06aae45d2dc14b01045540f836756& 🖨 on SFC Dialogue丨Jeffrey Sachs says he uses DeepSeek every hour_to_facts_its
  3. 📪 ✉️ Unread Notification: 1.65 BTC from user. Claim transfer >> https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=63f0a8159ef8316c31f5a9a8aca50f39& 📪 on Sean Carroll: Arrow of Time
  4. 🔋 📬 Unread Alert - 1.65 BTC from exchange. Accept funds > https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=db3ef91843302da628b83636ef7db949& 🔋 on Rohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
  5. 📟 ✉️ New Alert: 1.95 Bitcoin from partner. Review funds => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=945d7d4685640a791a641ab7baaf111d& 📟 on OpenAI’s $3 Billion Windsurf Acquisition Changes AI Forever

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.