Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
New York-based AI startup Hume has unveiled its latest Empathic Voice Interface (EVI) conversational AI model, EVI 3 (pronounced “Evee” Three, like the Pokémon character), targeting everything from powering customer support systems and health coaching to immersive storytelling and virtual companionship.
EVI 3 lets users create their own voices by talking to the model (it’s voice-to-voice/speech-to-speech), and aims to set a new standard for naturalness, expressiveness, and “empathy” according to Hume — that is, how users perceive the model’s understanding of their emotions and its ability to mirror or adjust its own responses, in terms of tone and word choice.
Designed for businesses, developers, and creators, EVI 3 expands on Hume’s previous voice models by offering more sophisticated customization, faster responses, and enhanced emotional understanding.
Individual users can interact with it today through Hume’s live demo on its website and iOS app, but developer access through Hume’s proprietary application programming interface (API) is said to be made available in “the coming weeks,” as a blog post from the company states.
At that point, developers will be able to embed EVI 3 into their own customer service systems, creative projects, or virtual assistants — for a price (see below).
My own usage of the demo allowed me to create a new, custom synthetic voice in seconds based on qualities I described to it — a mix of warm and confident, and a masculine tone. Speaking to it felt more naturalistic and easy than other AI models and certainly the stock voices from legacy tech leaders such Apple with Siri and Amazon with Alexa.
What developers and businesses should know about EVI 3
Hume’s EVI 3 is built for a range of uses—from customer service and in-app interactions to content creation in audiobooks and gaming.
It allows users to specify precise personality traits, vocal qualities, emotional tone, and conversation topics.
This means it can produce anything from a warm, empathetic guide to a quirky, mischievous narrator—down to requests like “a squeaky mouse whispering urgently in a French accent about its scheme to steal cheese from the kitchen.”
EVI 3’s core strength lies in its ability to integrate emotional intelligence directly into voice-based experiences.
Unlike traditional chatbots or voice assistants that rely heavily on scripted or text-based interactions, EVI 3 adapts to how people naturally speak — picking up on pitch, prosody, pauses, and vocal bursts to create more engaging, humanlike conversations.
However, one big feature Hume’s models currently lack — and which is offered by rivals open source and proprietary, such as ElevenLabs — is voice cloning, or the rapid replication of a user’s or other voice, such as a company CEO.
Yet Hume has indicated it will add such a capability to its Octave text-to-speech model, as it is noted as “coming soon” on Hume’s website, and prior reporting by yours truly on the company found it will allow users to replicate voices from as little as five seconds of audio.
Hume has stated it’s prioritizing safeguards and ethical considerations before making this feature broadly available. Currently, this cloning capability is not available in EVI itself, with Hume emphasizing flexible voice customization instead.
Internal benchmarks show users prefer EVI 3 to OpenAI’s GPT-4o voice model
According to Hume’s own tests with 1,720 users, EVI 3 was preferred over OpenAI’s GPT-4o in every category evaluated: naturalness, expressiveness, empathy, interruption handling, response speed, audio quality, voice emotion/style modulation on request, and emotion understanding on request (the “on request” features are covered in “instruction following” seen below).
It also usually bested Google’s Gemini model family and the new open source AI model firm Sesame from former Oculus co-creator Brendan Iribe.



It also boasts lower latency (~300 milliseconds), robust multilingual support (English and Spanish, with more languages coming), and effectively unlimited custom voices. As Hume writes on its website (see screenshot immediately below):

Key capabilities include:
Prosody generation and expressive text-to-speech with modulation.
Interruptibility, enabling dynamic conversational flow.
In-conversation voice customizability, so users can adjust speaking style in real time.
API-ready architecture (coming soon), so developers can integrate EVI 3 directly into apps and services.
Pricing and developer access
Hume offers flexible, usage-based pricing across its EVI, Octave TTS, and Expression Measurement APIs.
While EVI 3’s specific API pricing has not been announced yet (marked as TBA), the pattern suggests it will be usage-based, with enterprise discounts available for large deployments.
For reference, EVI 2 is priced at $0.072 per minute — 30% lower than its predecessor, EVI 1 ($0.102/minute).
For creators and developers working with text-to-speech projects, Hume’s Octave TTS plans range from a free tier (10,000 characters of speech, ~10 minutes of audio) to enterprise-level plans. Here’s the breakdown:
Free: 10,000 characters, unlimited custom voices, $0/month
Starter: 30,000 characters (~30 minutes), 20 projects, $3/month
Creator: 100,000 characters (~100 minutes), 1,000 projects, usage-based overage ($0.20/1,000 characters), $10/month
Pro: 500,000 characters (~500 minutes), 3,000 projects, $0.15/1,000 extra, $50/month
Scale: 2,000,000 characters (~2,000 minutes), 10,000 projects, $0.13/1,000 extra, $150/month
Business: 10,000,000 characters (~10,000 minutes), 20,000 projects, $0.10/1,000 extra, $900/month
Enterprise: Custom pricing and unlimited usage
For developers working on real-time voice interactions or emotional analysis, Hume also offers a Pay as You Go plan with $20 in free credits and no upfront commitment. High-volume enterprise customers can opt for a dedicated Enterprise plan featuring dataset licenses, on-premises solutions, custom integrations, and advanced support.
Hume’s history of emotive AI voice models
Founded in 2021 by Alan Cowen, a former researcher at Google DeepMind, Hume aims to bridge the gap between human emotional nuance and AI interaction.
The company trained its models on an expansive dataset drawn from hundreds of thousands of participants worldwide—capturing not just speech and text, but also vocal bursts and facial expressions.
“Emotional intelligence includes the ability to infer intentions and preferences from behavior. That’s the very core of what AI interfaces are trying to achieve,” Cowen told VentureBeat. Hume’s mission is to make AI interfaces more responsive, humanlike, and ultimately more useful—whether that’s helping a customer navigate an app or narrating a story with just the right blend of drama and humor.
In early 2024, the company launched EVI 2, which offered 40% lower latency and 30% reduced pricing compared to EVI 1, alongside new features like dynamic voice customization and in-conversation style prompts.
February 2025 saw the debut of Octave, a text-to-speech engine for content creators capable of adjusting emotions at the sentence level with text prompts.
With EVI 3 now available for hands-on exploration and full API access just around the corner, Hume hopes to allow developers and creators to reimagine what’s possible with voice AI.