Emotive Voice AI Startup Hume Launches New EVI 3 Model With Rapid Custom Voice Creation

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

New York-based AI startup Hume has unveiled its latest Empathic Voice Interface (EVI) conversational AI model, EVI 3 (pronounced “Evee” Three, like the Pokémon character), targeting everything from powering customer support systems and health coaching to immersive storytelling and virtual companionship.

EVI 3 lets users create their own voices by talking to the model (it’s voice-to-voice/speech-to-speech), and aims to set a new standard for naturalness, expressiveness, and “empathy” according to Hume — that is, how users perceive the model’s understanding of their emotions and its ability to mirror or adjust its own responses, in terms of tone and word choice.

Designed for businesses, developers, and creators, EVI 3 expands on Hume’s previous voice models by offering more sophisticated customization, faster responses, and enhanced emotional understanding.

Individual users can interact with it today through Hume’s live demo on its website and iOS app, but developer access through Hume’s proprietary application programming interface (API) is said to be made available in “the coming weeks,” as a blog post from the company states.

At that point, developers will be able to embed EVI 3 into their own customer service systems, creative projects, or virtual assistants — for a price (see below).

My own usage of the demo allowed me to create a new, custom synthetic voice in seconds based on qualities I described to it — a mix of warm and confident, and a masculine tone. Speaking to it felt more naturalistic and easy than other AI models and certainly the stock voices from legacy tech leaders such Apple with Siri and Amazon with Alexa.

What developers and businesses should know about EVI 3

Hume’s EVI 3 is built for a range of uses—from customer service and in-app interactions to content creation in audiobooks and gaming.

It allows users to specify precise personality traits, vocal qualities, emotional tone, and conversation topics.

This means it can produce anything from a warm, empathetic guide to a quirky, mischievous narrator—down to requests like “a squeaky mouse whispering urgently in a French accent about its scheme to steal cheese from the kitchen.”

EVI 3’s core strength lies in its ability to integrate emotional intelligence directly into voice-based experiences.

Unlike traditional chatbots or voice assistants that rely heavily on scripted or text-based interactions, EVI 3 adapts to how people naturally speak — picking up on pitch, prosody, pauses, and vocal bursts to create more engaging, humanlike conversations.

However, one big feature Hume’s models currently lack — and which is offered by rivals open source and proprietary, such as ElevenLabs — is voice cloning, or the rapid replication of a user’s or other voice, such as a company CEO.

Yet Hume has indicated it will add such a capability to its Octave text-to-speech model, as it is noted as “coming soon” on Hume’s website, and prior reporting by yours truly on the company found it will allow users to replicate voices from as little as five seconds of audio.

Hume has stated it’s prioritizing safeguards and ethical considerations before making this feature broadly available. Currently, this cloning capability is not available in EVI itself, with Hume emphasizing flexible voice customization instead.

Internal benchmarks show users prefer EVI 3 to OpenAI’s GPT-4o voice model

According to Hume’s own tests with 1,720 users, EVI 3 was preferred over OpenAI’s GPT-4o in every category evaluated: naturalness, expressiveness, empathy, interruption handling, response speed, audio quality, voice emotion/style modulation on request, and emotion understanding on request (the “on request” features are covered in “instruction following” seen below).

It also usually bested Google’s Gemini model family and the new open source AI model firm Sesame from former Oculus co-creator Brendan Iribe.

It also boasts lower latency (~300 milliseconds), robust multilingual support (English and Spanish, with more languages coming), and effectively unlimited custom voices. As Hume writes on its website (see screenshot immediately below):

Key capabilities include:

Prosody generation and expressive text-to-speech with modulation.

Interruptibility, enabling dynamic conversational flow.

In-conversation voice customizability, so users can adjust speaking style in real time.

API-ready architecture (coming soon), so developers can integrate EVI 3 directly into apps and services.

Pricing and developer access

Hume offers flexible, usage-based pricing across its EVI, Octave TTS, and Expression Measurement APIs.

While EVI 3’s specific API pricing has not been announced yet (marked as TBA), the pattern suggests it will be usage-based, with enterprise discounts available for large deployments.

For reference, EVI 2 is priced at $0.072 per minute — 30% lower than its predecessor, EVI 1 ($0.102/minute).

For creators and developers working with text-to-speech projects, Hume’s Octave TTS plans range from a free tier (10,000 characters of speech, ~10 minutes of audio) to enterprise-level plans. Here’s the breakdown:

Free: 10,000 characters, unlimited custom voices, $0/month

Starter: 30,000 characters (~30 minutes), 20 projects, $3/month

Creator: 100,000 characters (~100 minutes), 1,000 projects, usage-based overage ($0.20/1,000 characters), $10/month

Pro: 500,000 characters (~500 minutes), 3,000 projects, $0.15/1,000 extra, $50/month

Scale: 2,000,000 characters (~2,000 minutes), 10,000 projects, $0.13/1,000 extra, $150/month

Business: 10,000,000 characters (~10,000 minutes), 20,000 projects, $0.10/1,000 extra, $900/month

Enterprise: Custom pricing and unlimited usage

For developers working on real-time voice interactions or emotional analysis, Hume also offers a Pay as You Go plan with $20 in free credits and no upfront commitment. High-volume enterprise customers can opt for a dedicated Enterprise plan featuring dataset licenses, on-premises solutions, custom integrations, and advanced support.

Hume’s history of emotive AI voice models

Founded in 2021 by Alan Cowen, a former researcher at Google DeepMind, Hume aims to bridge the gap between human emotional nuance and AI interaction.

The company trained its models on an expansive dataset drawn from hundreds of thousands of participants worldwide—capturing not just speech and text, but also vocal bursts and facial expressions.

“Emotional intelligence includes the ability to infer intentions and preferences from behavior. That’s the very core of what AI interfaces are trying to achieve,” Cowen told VentureBeat. Hume’s mission is to make AI interfaces more responsive, humanlike, and ultimately more useful—whether that’s helping a customer navigate an app or narrating a story with just the right blend of drama and humor.

In early 2024, the company launched EVI 2, which offered 40% lower latency and 30% reduced pricing compared to EVI 1, alongside new features like dynamic voice customization and in-conversation style prompts.

February 2025 saw the debut of Octave, a text-to-speech engine for content creators capable of adjusting emotions at the sentence level with text prompts.

With EVI 3 now available for hands-on exploration and full API access just around the corner, Hume hopes to allow developers and creators to reimagine what’s possible with voice AI.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

What's Hot

Tencent Hunyuan Releases and Open Sources Image Model 2.1, Supporting Native 2K High-Quality Images_the_model_This

Task orders and bottlenecks: how the largest US shipbuilder is putting AI to work

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

Emotive voice AI startup Hume launches new EVI 3 model with rapid custom voice creation

Software is 40% of security budgets as CISOs shift to AI defense

How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike