Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Spotlight on AI at TechCrunch Disrupt: Don’t miss these sessions backed by JetBrains and Greenfield

AI Making Call Center Agents Better or Replacing

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 2

By Advanced AI EditorJuly 12, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Voice AI is changing the way we use technology, allowing for more natural and intuitive conversations. Meanwhile, advanced AI agents can now understand complex questions and act autonomously on our behalf.

In Part 1 of this series, you learned how you can use the combination of Amazon Bedrock and Pipecat, an open source framework for voice and multimodal conversational AI agents to build applications with human-like conversational AI. You learned about common use cases of voice agents and the cascaded models approach, where you orchestrate several components to build your voice AI agent.

In this post (Part 2), you explore how to use speech-to-speech foundation model, Amazon Nova Sonic, and the benefits of using a unified model.

Architecture: Using Amazon Nova Sonic speech-to-speech

Amazon Nova Sonic is a speech-to-speech foundation model that delivers real-time, human-like voice conversations with industry-leading price performance and low latency. While the cascaded models approach outlined in Part 1 is flexible and modular, it requires orchestration of automatic speech recognition (ASR), natural language processing (NLU), and text-to-speech (TTS) models. For conversational use cases, this might introduce latency and result in loss of tone and prosody. Nova Sonic combines these components into a unified model that processes audio in real time with a single forward pass, reducing latency while streamlining development.

By unifying these capabilities, the model can dynamically adjust voice responses based on the acoustic characteristics and conversational context of the input, creating more fluid and contextually appropriate dialogue. The system recognizes conversational subtleties such as natural pauses, hesitations, and turn-taking cues, allowing it to respond at appropriate moments and seamlessly manage interruptions during conversation. Amazon Nova Sonic also supports tool use and agentic RAG with Amazon Bedrock Knowledge Bases enabling your voice agents to retrieve information. Refer to the following figure to understand the end-to-end flow.

End-to-end architecture diagram of voice-enabled AI agent orchestrated by Pipecat, featuring real-time processing and AWS services

The choice between the two approaches depends on your use case. While the capabilities of Amazon Nova Sonic are state-of-the-art, the cascaded models approach outlined in Part 1 might be suitable if you require additional flexibility or modularity for advanced use cases.

AWS collaboration with Pipecat

To achieve a seamless integration, AWS collaborated with the Pipecat team to support Amazon Nova Sonic in version v0.0.67, making it straightforward to integrate state-of-the-art speech capabilities into your applications.

Kwindla Hultman Kramer, Chief Executive Officer at Daily.co and Creator of Pipecat, shares his perspective on this collaboration:

“Amazon’s new Nova Sonic speech-to-speech model is a leap forward for real-time voice AI. The bidirectional streaming API, natural-sounding voices, and robust tool-calling capabilities open up exciting new possibilities for developers. Integrating Nova Sonic with Pipecat means we can build conversational agents that not only understand and respond in real time, but can also take meaningful actions; like scheduling appointments or fetching information-directly through natural conversation. This is the kind of technology that truly transforms how people interact with software, making voice interfaces faster, more human, and genuinely useful in everyday workflows.”

“Looking forward, we’re thrilled to collaborate with AWS on a roadmap that helps customers reimagine their contact centers with integration to Amazon Connect and harness the power of multi-agent workflows through the Strands agentic framework. Together, we’re enabling organizations to deliver more intelligent, efficient, and personalized customer experiences—whether it’s through real-time contact center transformation or orchestrating sophisticated agentic workflows across industries.”

Getting started with Amazon Nova Sonic and Pipecat

To guide your implementation, we provide a comprehensive code example that demonstrates the basic functionality. This example shows how to build a complete voice AI agent with Amazon Nova Sonic and Pipecat.

Prerequisites

Before using the provided code examples with Amazon Nova Sonic, make sure that you have the following:

Implementation steps

After you complete the prerequisites, you can start setting up your sample voice agent:

Clone the repository:

git clone https://github.com/aws-samples/build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock
cd build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock/part-2

Set up a virtual environment:

cd server
python3 -m venv
venv source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Create a .env file with your credentials:

DAILY_API_KEY=your_daily_api_key
AWS_ACCESS_KEY_ID=your_aws_access_key_id
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
AWS_REGION=your_aws_region

Start the server:

Connect using a browser at http://localhost:7860 and grant microphone access.
Start the conversation with your AI voice agent.

Customize your voice AI agent

To customize your voice AI agent, start by:

Modifying bot.py to change conversation logic.
Adjusting model selection in bot.py for your latency and quality needs.

To learn more, see the README of our code sample on Github.

Clean up

The preceding instructions are for setting up the application in your local environment. The local application will uses AWS services and Daily through IAM and API credentials. For security and to avoid unanticipated costs, when you’re finished, delete these credentials so that they can no longer be accessed.

Amazon Nova Sonic and Pipecat in action

The demo showcases a scenario for an intelligent healthcare assistant. The demo was presented at the keynote in AWS Summit Sydney 2025 by Rada Stanic, Chief Technologist and Melanie Li, Senior Specialist Solutions Architect – Generative AI.

The demo showcases a simple fun facts voice agent in a local environment using SmallWebRTCTransport. As the user speaks, the voice agent provides transcription in real-time as displayed in the terminal.

Enhancing agentic capabilities with Strands Agents

A practical way to boost agentic capability and understanding is to implement a general tool call that delegates tool selection to an external agent such as a Strands Agent. The delegated Strands Agent can then reason or think about your complex query, perform multi-step tasks with tool calls, and return a summarized response.

To illustrate, let’s review a simple example. If the user asks a question like: “What is the weather like near the Seattle Aquarium?”, the voice agent can delegate to a Strands agent through a general tool call such as handle_query.

The Strands agent will handle the query and think about the task, for example:

I need to get the weather information for the Seattle Aquarium. To do this, I need the latitude and longitude of the Seattle Aquarium. I will first use the ‘search_places’ tool to find the coordinates of the Seattle Aquarium.

The Strands Agent will then execute the search_places tool call, a subsequent get_weather tool call, and return a response back to the parent agent as part of the handle_query tool call. This is also known as the agent as tools pattern.

To learn more, see the example in our hands-on workshop.

Conclusion

Building intelligent AI voice agents is more accessible than ever through the combination of open source frameworks such as Pipecat, and powerful foundation models on Amazon Bedrock.

In this series, you learned about two common approaches for building AI voice agents. In Part 1, you learned about the cascaded models approach; diving into each component of a conversational AI system. In Part 2, you learned about how using Amazon Nova Sonic, a speech-to-speech foundation model, can simplify implementation and unify these components into a single model architecture. Looking ahead, stay tuned for exciting developments in multi-modal foundation models, including the upcoming Nova any-to-any models—these innovations will continually improve your voice AI applications.

Resources

To learn more about voice AI agents, see the following resources:

To get started with your own voice AI project, contact your AWS account team to explore an engagement with AWS Generative AI Innovation Center (GAIIC).

About the Authors

Adithya Suresh is a Deep Learning Architect at AWS Generative AI Innovation Center based in Sydney, where he collaborates directly with enterprise customers to design and scale transformational generative AI solutions for complex business challenges. He leverages AWS generative AI services to build bespoke AI systems that drive measurable business value across diverse industries.

Daniel Wirjo is a Solutions Architect at AWS, with focus across AI and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Karan Singh is a Generative AI Specialist at AWS, where he works with top-tier third-party foundation model and agentic frameworks providers to develop and execute joint go-to-market strategies, enabling customers to effectively deploy and scale solutions to solve enterprise generative AI challenges.

Melanie Li, PhD is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Osman Ipek is a seasoned Solutions Architect on Amazon’s Artificial General Intelligence team, specializing in Amazon Nova foundation models. With over 12 years of experience in software and machine learning, he has driven innovative Alexa product experiences reaching millions of users. His expertise spans voice AI, natural language processing, large language models and MLOps, with a passion for leveraging AI to create breakthrough products.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleGetty v Stability AI: an observer’s view
Next Article Paper page – Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Advanced AI Editor
  • Website

Related Posts

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

August 29, 2025

Detect Amazon Bedrock misconfigurations with Datadog Cloud Security

August 29, 2025

Introducing auto scaling on Amazon SageMaker HyperPod

August 29, 2025

Comments are closed.

Latest Posts

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Australian School Faces Pushback over AI Art Course—and More Art News

London Museum Secures Banksy’s Piranhas

Latest Posts

Spotlight on AI at TechCrunch Disrupt: Don’t miss these sessions backed by JetBrains and Greenfield

August 29, 2025

AI Making Call Center Agents Better or Replacing

August 29, 2025

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR

August 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Spotlight on AI at TechCrunch Disrupt: Don’t miss these sessions backed by JetBrains and Greenfield
  • AI Making Call Center Agents Better or Replacing
  • MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR
  • Set up custom domain names for Amazon Bedrock AgentCore Runtime agents
  • Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals

Recent Comments

  1. slot online on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Danielcet on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. GeorgeSauct on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Jamessmozy on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Green Glucose on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.