Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

What’s Happening With IBM Stock?

Putting AI To Work To Stymie The Email Fraudsters And Crooks

Why Big Investors Are All Ears For Voice AI Startups

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Voice/Audio Generation

OpenAI unveils new audio models to redefine voice AI with real-time speech capabilities | Technology News

By Advanced AI EditorMarch 21, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


OpenAI has unveiled a new suite of audio models to power its voice agents, and it is now available to developers around the world. The latest updates mark a major step in voice AI technology. The AI powerhouse has introduced new tools and models that could enable developers to create voice agents, or AI-driven systems that are capable of real-time speech interactions.

Even though voice is a natural human interface, it remains largely underutilised in AI applications of today. With the slew of updates, OpenAI is aiming to change this, essentially enabling businesses and developers to create more sophisticated voice agents. These systems can function on their own, assisting users through spoken interactions under various use cases that could range from customer care to learning languages.

What’s new?

OpenAI has introduced three main advancements in audio AI. These are two state-of-the-art speech-to-text models, a new text-to-speech model, and some enhancements to the Agents SDK. The new speech-to-text models have outperformed OpenAI’s previous Whisper models in almost all tested languages, with significant improvements in transcription accuracy and efficiency.

Story continues below this ad

On the other hand, the new text-to-speech model enables precise control over not just the spoken words but how they are said, enhancing the overall expressiveness of AI-generated speech. With the Agents SDK, the latest update makes it easier to convert text-based agents into voice-based AI assistants offering seamless interactions.

What do voice agents do?

Voice agents function similarly to text-based AI assistants. However, they operate through speech instead of text interactions. Some use cases include customer support, where AI answers calls and handles queries; language learning, where an AI-powered coach can help users with pronunciations and practise conversations; and accessibility tools, where they offer voice-controlled assistants for users with disabilities.

How to build voice AI?

When it comes to building voice AI, there are essentially two approaches – speech-to-speech (S2S) and speech-to-text-to-speech (S2T2S). S2S models take spoken input and produce spoken output without intermediate transcription. Reportedly, this approach maintains nuances like intonation, emotion, and emphasis. Meanwhile, S2T2S models initially transcribe speech as text, process it, and convert it back into speech. Although these are easier to implement, they often lose key details and may add latency. OpenAI’s latest updates emphasise the advantages of speech-to-speech processing, making AI interactions more natural and fluid.

GPT-4o Transcribe and GPT-4o Mini Transcribe

OpenAI has also introduced two new transcription models – GPT-4o Transcribe and GPT-4o Mini Transcribe. While GPT-4o Transcribe is a large speech model that has been trained on vast amounts of audio data with highly accurate transcriptions, the GPT-4o Mini Transcribe is a smaller, more efficient model that has been designed for faster and cost-efficient transcription. OpenAI has claimed that both models deliver industry-leading word error rates, significantly improving upon previous Whisper versions. When it comes to pricing, GPT-4o Transcribe is offered at $0.006 per minute, the same as Whisper, while GPT-4o Mini Transcribe is at $0.03 per minute.

Story continues below this ad

The latest updates from OpenAI seem to suggest that voice would be a key focus area for AI development. These models with their affordability factor are likely to push businesses and developers to build high-quality voice agents.

Expand

© IE Online Media Services Pvt Ltd



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleShoppers want AI-based customer service to improve
Next Article ChatGPT accused of saying an innocent man murdered his children
Advanced AI Editor
  • Website

Related Posts

ElevenLabs & Burda: Strategic Partnership for Audio AI and Voice Agent Solutions

September 9, 2025

ARN takes audience temperature on AI voices in radio

September 8, 2025

The Most Realistic AI Voice Generator for YouTubers, Businesses, and Creators in 2025

September 2, 2025

Comments are closed.

Latest Posts

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

Latest Posts

What’s Happening With IBM Stock?

September 10, 2025

Putting AI To Work To Stymie The Email Fraudsters And Crooks

September 10, 2025

Why Big Investors Are All Ears For Voice AI Startups

September 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • What’s Happening With IBM Stock?
  • Putting AI To Work To Stymie The Email Fraudsters And Crooks
  • Why Big Investors Are All Ears For Voice AI Startups
  • Tesla targets Bay Area airports as next step for Robotaxi rollout
  • AI gaming startup Born raises $15M to build ‘social’ AI companions that combat loneliness

Recent Comments

  1. quirkyseahorse3Nalay on Reverse Engineering The IBM PC110, One PCB At A Time
  2. MatthewDor on Anthropic’s popular Claude Code AI tool now included in its $20/month Pro plan
  3. zanyflamingo2Nalay on AI code suggestions sabotage software supply chain • The Register
  4. zestysquid7Nalay on OpenAI countersues Elon Musk, calls for enjoinment from ‘further unlawful and unfair action’
  5. zanyflamingo2Nalay on [2503.10822] Rotated Bitboards and Reinforcement Learning in Computer Chess and Beyond

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.