Krisp Technologies Inc., a leading provider of real-time voice artificial intelligence solutions, today announced the launch of VIVA, its new voice isolation AI model and software development kit designed for voice AI agents.
The company also revealed that VIVA is now processing more than 1 billion minutes of voice audio per month globally.
VIVA, short for Voice Isolation for Voice Agents, integrates into an application’s audio path. It boosts voice AI agents’ ability to detect voice activity and improves their turn-taking behavior, helping prevent false interruptions and creating more natural, effective conversations.
Consumers are coming to expect more from AI voice interactions. In the past, conversations with online systems were rigid and scripted, often little more than glorified recordings. Those days are ending. Today, voice agents can hold actual conversations, respond dynamically, and adapt to context.
“The industry term for this is turn taking,” co-founder and Chief Executive Davit Baghdasaryan said in an exclusive interview with SiliconANGLE. “Turn-taking gets really messed up when there’s background noise — especially background voices. The AI gets very confused.”
Turn-taking refers to the back-and-forth flow of conversation, or knowing when to speak and when to listen. Humans naturally do this with verbal and nonverbal adjustments in speech, pauses, intonation and body language.
In voice AI turn-taking refers to detection of when a user stops speaking and when it’s appropriate to respond without interrupting or leaving a long silence. Poor turn taking leads to awkward or unnatural interactions.
Krisp’s VIVA model processes audio in under 20 milliseconds, significantly enhancing responsiveness. It can improve turn-taking accuracy by up to 3.5x, which contributes to a 50% reduction in dropped calls and helps boost customer satisfaction.
Unlike models that rely on power-hungry graphics processing units, VIVA runs efficiently on central processing units, making it ideal for deployment on a broad range of devices. This allows it to operate either embedded or alongside larger models without disrupting performance.
This is critical for businesses, as it boosts transcription accuracy even in noisy environments and eliminates irrelevant audio, such as background television or unrelated conversations, improving both automated understanding and overall user experience.
Human communication relies on subtle audio and behavioral cues. While people navigate these cues naturally, voice agents still struggle with them. Background sounds, laughter or even pauses can cause interruptions or confusion in AI responses.
“There are five, six different cues that come solely from audio,” Baghdasaryan explained. “The AI must be aware of these cues if we want to have human-level conversational AI out there.”
Krisp designed VIVA to recognize and adapt to these signals. One VIVA model, for example, filters out laughter — especially helpful in environments with children — so that bots don’t misinterpret the sounds as part of a user’s speech.
“Laughter is a big, big thing,” Baghdasaryan said. “We have models that remove laughter so that the bot doesn’t get interrupted by it.”
VIVA is already integrated into AI agent systems used by Decagon AI, Voxex.ai, Vapi Inc., Ultravox.ai (formerly Fixie.ai), LiveKit Inc., and some of the world’s largest AI labs, where it’s delivering measurable improvements.
“When our development team demonstrated Krisp’s capabilities, we were blown away,” said Kumar Saurav, chief technology officer of Vodex. “Seeing our bot continue uninterrupted, even amidst loud office noise, was a game-changer for us.”
Baghdasaryan concluded that with the milestone of billions of audio requests each month, VIVA is prepared to enable developers to build more responsive AI agents and provide a foundation for better customer support and virtual companions.
Image: SiliconANGLE/Microsoft Designer
Support our open free content by sharing and engaging with our content and community.
Join theCUBE Alumni Trust Network
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
11.4k+
CUBE Alumni Network
C-level and Technical
Domain Experts
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.