Voice AI: How Deepgram Is Perfecting The Future Of Customer Service

While it’s still the early days, voice AI is one area where artificial intelligence promises cost savings and service improvements.

Deepgram is a decade-old company that quickly saw AI’s potential in voice. It is developing voice AI for enterprise use cases like call centers and interactive voice response (IVR) systems that millions access every day. To date, Deepgram has processed more than 50,000 years of audio and transcribed more than one trillion words.

Deepgram offers speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities backed by an enterprise-grade runtime. More than 200,000 developers build on Deepgram’s voice-native foundational models that are accessed through cloud APIs or as self-hosted / on-premises APIs.

Voice AI is a massive opportunity

VP of Product Natalie Rutgers said more than 700 million customer service calls happen daily. Add more than 300 billion business calls, 75 million-plus drive-through orders and north of 35 million medical appointments, and you have opportunities for voice AI to improve processes and save employees for higher-level tasks. Drive-throughs alone are a billion-dollar market.

“Why are customers coming to us?” Rutgers asked. “They’re often coming to us for the things that are the biggest efficiency burns on their business. In the drive through space, the CTO of Jack in the Box said that integrating voice agents is going to be one of the most impactful initiatives for their business operations over the next five years.”

Yes, AI will take jobs away, but in some cases, they’re jobs that are hard to fill. Call centers have turnover rates. That increases training and recruiting while sapping productivity. Introduce speech-to-speech AI, and those costs come down.

Yes, AI will take jobs away, but in some cases, they’re jobs that are hard to fill

Click to Share

Ten years ago, contact centers generated massive amounts of pre-recorded daily calls that had to be analyzed and transcribed. As staff turnover increased, institutional memory suffered due to a lack of customer familiarity. Companies struggled to understand those conversations and interact in real-time.

Real-time interactions bring challenges and opportunities.

Rutgers said Deepgram focuses on real-time interactions. That’s a key difference from many competitors who focus on select, almost pre-determined, use cases.

“(With podcasts, for example), an audio designer can sit for hours and make sure the end voice has exactly the personality and the expressiveness they want in their content,” Rutgers explained. “When you’re generating a voice on the fly to have a conversation (in real-time), you don’t get that time.”

Real-time voice AI conversations must address several issues that come naturally to successful human conversations. One is contending with accents. Deepgram works with partners to access calls and accents they deal with, along with industry-specific jargon like financial or medical terms.

Each company’s model is unique to them; no one else gets license to it. Models are often deployed in a virtual private cloud or on-premise, so data doesn’t leave the environment and remains compliant. Deepgram also manages and scales customer deployments. That’s becoming a valuable service in the United Kingdom as data privacy tightens.

Making voice AI conversations sound more natural

The voice AI industry is slowly chipping away at making AI-generated conversations sound more natural. Natural conversations have 200-500 milliseconds of latency. Today’s industry-best solutions are between 800 and 1,200. Once latency is addressed, conversation quality will receive more attention.

“We’re not only measuring the real-time latency, but also how often AI is tending to interrupt you and reducing the humanness of what you would take for granted in a conversation, because the AI is not giving you that right now,” Rutgers said.

Any successful voice AI deployment in finance must address unique challenges. Systems can struggle with numbers, dollar signs and alphanumerics. Systems can read “3:00 p.m.” as “300 p.m.” and “$5.7 million” as “five dollars and seven cents.”

“These are issues many voice AI companies don’t understand,” Rutgers said. “We deeply understand why a model might hallucinate in this way, what you need to overcome it, and how you can have a successful deployment.”

Preparing customers for voice AI

Companies can’t just thrust a voice AI system on their customers; they have to prepare them ahead of time. Rutgers said that begins with understanding who their customers are and what they expect. A pharmacy chain, for example, learned that many of its senior customers memorized its touchtone menu to expedite the process. It tweaked many of the questions and answers to offer a more natural flow.

“In the financial space, something similar can be done,” Rutgers said. “If someone’s used to calling in, what sorts of questions are they used to being asked, how might they answer them even a little bit more naturally, and have a couple more back-and-forth questions just to start. But as that gets adoption and the retention rates are good, then they can continue to evolve it.”

“It’s not just a technology shift; it’s a behavior shift with your end users as well. As the voices get more natural, and the conversations are much more fluid, the spaces where there is a need to be much more operationally efficient, and there’s a lot of scale and volume, that’s who’s being most successful.”

As the voices get more natural, and the conversations are much more fluid, the spaces where there is a need to be much more operationally efficient, and there’s a lot of scale and volume, that’s who’s being most successful

Click to Share

While ChatGPT and Anthropic have introduced many to AI, and they have their place, Rutgers said they shouldn’t be a go-to for conversational solutions where context is important.

The voice AI industry is in the early stages, with industry chatter centering on perfecting the most obvious aspects of conversations, like response times, natural reactions, and flexibility. Rutgers said that will make the difference between customers asking for a human and sticking with an AI system.

“Over the last couple of years, we’ve also added the voice so that you can speak back, integrate those voices, but also have an end-to-end, speech-to-speech thinking system that allows you to listen, think and speak just as naturally as a human would,” Rutgers said.

Source link

What's Hot

“I have Doubao, why should I buy your AI course?” Exclusive Interview with 51Talk’s Cai Lin_their_at

Perplexity’s Comet and the challenge of building an AI-first browser

Tuosda Launches First Humanoid Robot! Industry Competes in the ‘Industrial Blue Ocean’_robot_strong_The

Voice AI: How Deepgram Is Perfecting The Future Of Customer Service

Nubank To Continue Leveraging AI To Enhance Digital Financial Services In Latin America

Ethical Integration for City Services

Verizon gives customers another reason to be angry

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

“I have Doubao, why should I buy your AI course?” Exclusive Interview with 51Talk’s Cai Lin_their_at

Perplexity’s Comet and the challenge of building an AI-first browser

Tuosda Launches First Humanoid Robot! Industry Competes in the ‘Industrial Blue Ocean’_robot_strong_The

What's Hot

Voice AI: How Deepgram Is Perfecting The Future Of Customer Service

Voice AI is a massive opportunity

Real-time interactions bring challenges and opportunities.

Making voice AI conversations sound more natural

Preparing customers for voice AI

Related Posts

Subscribe to Updates