Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

The AI Vs. Junior Talent Dilemma

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption

By Advanced AI EditorAugust 29, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

OpenAI adds to an increasingly competitive AI voice market for enterprises with its new model, gpt-realtime, that follows complex instructions and with voices “that sound more natural and expressive.”

As voice AI continues to grow, and customers find use cases such as customer service calls or real-time translation, the market for realistic-sounding AI voices that also offer enterprise-grade security is heating up. OpenAI claims its new model provides a more human-like voice, but it still needs to compete against companies like ElevenLabs.

The model will be available on the Realtime API, which the company also made generally available. Along with the gpt-realtime model, OpenAI also released new voices on the API, which it calls Cedar and Marin, and updated its other voices to work with the latest model.

OpenAI said in a livestream that it worked with its customers who are building voice applications to train gpt-realtime and “carefully aligned the model to evals that are built on real-world scenarios like customer support and academic tutoring.”

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

The company touted the model’s ability to create emotive, natural-sounding voices that also align with how developers build with the technology. 

Speech-to-speech models

The model operates within a speech-to-speech framework, enabling it to understand spoken prompts and respond vocally. Speech-to-speech models are ideally suited for real-time responses, where a person, typically a customer, interacts with an application. 

For example, a customer wants to return some products and calls a customer service platform. They could be talking to an AI voice assistant that responds to questions and requests as if they were speaking with a human. 

In a livestream, OpenAI customers T-Mobile showcased an AI voice-powered agent that helps people find new phones. Another customer, the real estate search platform Zillow, showcased an agent who helps someone narrow down a neighborhood to find the perfect place. 

OpenAI said gpt-realtime is its “most advanced, production-ready voice model.” Like its other voice models, it can switch languages mid-sentence. However, OpenAI researchers noted gpt-realtime can follow more complex instructions like “speak emphatically in a French accent.”

But gpt-realtime faces competition from other models that many brands already use. ElevenLabs released Conversation AI 2.0 in May. Soundhound partners with fast food franchises for an AI voice drive-thru. Emphatic AI startup Hume has launched its EVI 3 model, which allows users to generate AI versions of their own voice. 

As enterprises discover various use cases for voice AI, even more general model providers that offer multimodal LLMs are making a case for themselves. Mistral released its new Voxtral model, stating it would work well with real-time translation. Google is enhancing its audio capabilities and gaining popularity with an audio feature on NotebookLM that converts research notes into a podcast. 

Better instruction following

OpenAI said gpt-realtime is smarter and understands native audio better, including the ability to catch non-verbal cues like laughs or sighs. 

Benchmarking using the Big Bench Audio eval showed the model scoring 82.8% in accuracy, compared to its previous model, which scored 65.6%. OpenAI did not provide numbers testing gpt-realtime against models from its competitors. 

OpenAI focused on improving the model’s instruction-following capabilities, ensuring the model would adhere to directions more effectively. The new model achieves a score of 30.5% on the MultiChallenge audio benchmark. The engineers also beefed up function calling so gpt-realtime can access the correct tools. 

Realtime API updates

To support the new model and enhance how enterprises integrate real-time AI capabilities into their applications, OpenAI has added several new features to the Realtime API. 

It can now support MCP and recognize image inputs, allowing it to inform users about what it sees in real-time. This is a feature Google heavily emphasized during its Project Astra presentation last year. 

The Realtime API can also handle Session Initiation Protocol (SIP). SIP connects apps to phones like a public phone network or desk phones, opening up more contact center use cases. Users can also save and reuse prompts on the API.

So far, people are impressed with the model, although these are still initial tests of a model that was recently released.  

Tbh, the MCP and SIP features are the real story here, not just another model.

The ability to connect to external tools and systems seamlessly is what will finally move these models from being impressive demos to being integrated into actual workflows.

The real time aspect…

— JK (@_junaidkhalid1) August 28, 2025

Testing out gpt-realtime

Initial review:
– Noticable audio improvement
– It’s a stickler for the instructions (very good)
– Feels fast pic.twitter.com/LtyCs0QLXV

— Jake Colling (@JacobColling) August 28, 2025

Well, GPT-realtime got a livestream not because most users are interested, but for strategic business reasons

Call centers are a major target for LLM providers and the first company to reach a real breakthrough will get massive revenue

— AnKo (@anko_979) August 28, 2025

Pros & Cons from @OpenAI real-time update from someone building in AI audio:

Pro: Better function calling, more emotion, 20% cheaper, better control, image is cool but won’t use

Con: no custom voices (creative experience MUST HAVE), still *expensive* vs TTS-LLM-STT pipelines

— Gavin Purcell (@gavinpurcell) August 28, 2025

OpenAI reduced prices for gpt-realtime by 20% to $32 per million audio input tokens and $64 for audio output tokens. 

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.





Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAI or not, Will Smith’s crowd video is fresh cringe
Next Article Where should archaeologists dig next? The winners of this OpenAI contest can tell them.
Advanced AI Editor
  • Website

Related Posts

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

August 29, 2025

Nvidia’s strong Q2 results can’t mask the ASIC challenge in their future

August 29, 2025

Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

August 29, 2025

Comments are closed.

Latest Posts

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Sotheby’s to Launch First Series of Luxury Auctions in Abu Dhabi

Nazi-Looted Painting Turns Up in Argentinean Real Estate Listing

Latest Posts

The AI Vs. Junior Talent Dilemma

August 29, 2025

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

August 29, 2025

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

August 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • The AI Vs. Junior Talent Dilemma
  • ROSE: Remove Objects with Side Effects in Videos – Takara TLDR
  • DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei
  • Anthropic on using Claude user data for training AI: Privacy policy explained
  • MIT startup Commonwealth Fusion Systems raises $863 million

Recent Comments

  1. https://sw2002.ru/ on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. тонна арматуры цена казань on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. Pink salt trick on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Fatvim Weight Loss Formula on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. wigoxisoks on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.