Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Snapsheet And Foundation AI To Enhance Claims Document Management With New Product

Introducing Skiniglow-6 Matrix™ Collection from the Cohere Beauty Innovation Collaborative

Information concerning the total number of voting rights and shares in the share capital as of 31 january 2025.

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Improve conversational AI response times for enterprise applications with the Amazon Bedrock streaming API and AWS AppSync

By Advanced AI EditorJuly 10, 2025No Comments10 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Many enterprises are using large language models (LLMs) in Amazon Bedrock to gain insights from their internal data sources. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Organizations implementing conversational AI systems often face a common challenge: although their APIs can quickly find answers to targeted questions, more complex queries requiring reasoning-actioning (ReAct) logic can take substantial time to process, negatively impacting user experience. This issue is particularly pronounced in regulated industries where security requirements add additional complexity. For instance, a global financial services organization with over $1.5 trillion in assets under management encountered this exact challenge. Despite successfully implementing a conversational AI system that integrated with multiple LLMs and data sources, they needed a solution that could maintain their rigorous security protocols—including AWS services operating within virtual private cloud (VPC) environments and enterprise OAuth integration—while improving response times for complex queries.

AWS AppSync is a fully managed service that enables developers to build serverless GraphQL APIs with real-time capabilities. This post demonstrates how to combine AWS AppSync subscriptions with Amazon Bedrock streaming endpoints to deliver LLM responses incrementally. We provide an enterprise-grade implementation blueprint that helps organizations in regulated industries maintain security compliance while optimizing user experience through immediate real-time response delivery.

Solution overview

The solution discussed in this post uses AWS AppSync to start the asynchronous conversational workflow. An AWS Lambda function does the heavy lifting of interacting with the Amazon Bedrock streaming API. As the LLM produces tokens, they are streamed to the frontend using AWS AppSync mutations and subscriptions.

A reference implementation of the Lambda function and AWS AppSync API is provided in the sample code in this post. The following diagram illustrates the reference architecture. It provides a high-level overview of how the various AWS services are integrated to achieve the desired outcome.

Solution Architecture

Let’s traverse how a user’s request is handled in the solution, and how the user receives real-time responses from an LLM in Amazon Bedrock:

When the user loads the UI application, the application subscribes to the GraphQL subscription onSendMesssage(), which returns whether the WebSocket connection was successful or not.
After the user enters a query, it invokes a GraphQL query (getLLMResponse) and triggers the Data Source Lambda function.
The Data Source Lambda function publishes an event to the Amazon Simple Notification Service (Amazon SNS) topic, and a 201 message is sent to the user, completing the synchronous flow.

These steps are better illustrated in the following sequence diagram.

Sequence Diagram 1

The Orchestrator Lambda function gets triggered by a published SNS event and initiates the stream with the Amazon Bedrock API call InvokeModelWithResponseStream.
Amazon Bedrock receives the user query, initiates the stream, and starts sending stream tokens back to the Lambda function.
When the Orchestrator Lambda function receives a stream token from Amazon Bedrock, the function invokes the GraphQL mutation sendMessage.
The mutation triggers the onSendMessage subscription containing the LLM partial response, and the UI prints those stream tokens as it receives it.

The following diagram illustrates these steps in more detail.

Sequence Diagram 2

In the following sections, we discuss the components that make up the solution in more detail.

Data and API design

The AppSync API GraphQL schema consists of query, subscription, and mutation operations.

The following code is the query operation:

input GetLlmResponseInput {
sessionId: String!
message: String!
locale: String!
}
type Query {
getLlmResponse(input: GetLlmResponseInput!): GetLlmResponse
@aws_api_key
}

The query operation, getLLMResponse, is synchronous and accepts sessionId, locale, and user-provided message.

The frontend must send a unique sessionId; this session ID uniquely identifies the user’s chat session. The session ID doesn’t change for the duration of active conversation. For example, if the user reloads the frontend, a new sessionId is generated and sent to the query operation.

The frontend must also send locale, which indicates the user’s preferred language. For a list of supported locales, see Languages and locales supported by Amazon Lex V2. For example, we use en_US for North American English.

Finally, the user’s message (or query) is set in the message attribute. The value of the message attribute is passed to the LLM for analysis.

The following code is the subscription operation:

type Subscription {
onSendMessage(sessionId: String!): SendMessageResponse
@aws_subscribe(mutations: [“sendMessage”])
@aws_api_key
}

The AWS AppSync subscription operation, onSendMessage, accepts sessionId as a parameter. The frontend calls the onSendMessage subscription operation to subscribe to a WebSocket connection using sessionId. This allows the frontend to receive messages from the AWS AppSync API whenever a mutation operation successfully executes for the given sessionId.

The following code is the mutation operation:

input SendMessageInput {
sessionId: String!
message: String!
locale: String!
}
type Mutation {
sendMessage(input: SendMessageInput!): SendMessageResponse
@aws_api_key
@aws_iam
}

The mutation operation, sendMessage, accepts a payload of type SendMessageInput. The caller must provide all required attributes in the SendMessageInput type, indicated by the exclamation point in the GraphQL schema excerpt, to successfully send a message to the frontend using the mutation operation.

The Orchestrator Lambda function calls the sendMessage mutation to send partially received LLM tokens to the frontend. We discuss the Orchestrator Lambda function in more detail later in this post.

AWS AppSync Data Source Lambda function

AWS AppSync invokes the AWS AppSync Data Source Lambda function when the frontend calls the GraphQL query operation, getLLMResponse. The GraphQL query is a synchronous operation.

The implementation of the AWS AppSync Data Source Lambda function is in the following GitHub repo, called bedrock-appsync-ds-lambda. This Lambda function extracts the user’s message from the incoming GraphQL query operation and sends the value to the SNS topic. The Lambda function then returns success status code to the caller indicating that the message is submitted to the backend for processing.

AWS AppSync Orchestrator Lambda function

The AWS AppSync Orchestrator Lambda function runs whenever an event is published to the SNS topic. This function initiates the Amazon Bedrock streaming API using the converse_stream Boto3 API call.

The following code snippet shows how the Orchestrator Lambda function receives the SNS event, processes it, and then calls the Boto3 API:

brt = boto3.client(service_name=”bedrock-runtime”, region_name=”us-west-2″)
messages = []
message = {
“role”: “user”,
“content”: [{“text”: parsed_event[“message”]}]
}
messages.append(message)
response = brt.converse_stream(
modelId=model_id,
messages=messages
)

The code first instantiates the Boto3 client using the bedrock-runtime service name. The Lambda function receives the SNS event and parses it using the Python JSON library. The parsed contents are stored in the sns_event dictionary. The code creates a Amazon Bedrock Messages API style prompt with role and content attributes:

message = {
“role”: “user”,
“content”: [{“text”: parsed_event[“message”]}]
}

The content attribute’s value comes from the sns_event[“message”] attribute in the SNS event. Refer to the converse_stream Boto3 API documentation for list of role values.

The converse_stream API accepts modelId and messages parameters. The value of modelId comes from an environment variable set on the Lambda function. The messages parameter is of type dictionary, and it must only contain Amazon Bedrock Messages API style prompts.

When the converse_stream API successfully runs, it returns an object that the Lambda code further analyzes to send partial tokens to the frontend:

stream = response.get(‘body’)
if stream:
self.appsync = AppSync(locale=”en_US”, session_id=session_id)
self.appsync.invoke_mutation(DEFAULT_STREAM_START_TOKEN)
event_count = 0
buffer = “”
for event in stream:
if event:
if list(event)[0] == “contentBlockDelta”:
event_count += 1
buffer += event[“contentBlockDelta”][“delta”][“text”]
if event_count > 5:
self.appsync.invoke_mutation(buffer)
event_count = 0
buffer = “”
if len(buffer) != 0:
self.appsync.invoke_mutation(buffer)

As the LLM generates a token in response to the prompt it received, Lambda first sends DEFAULT_STREAM_START_TOKEN to the frontend using the AWS AppSync mutation operation. This token is a mechanism to alert the frontend to start rendering tokens. As the Lambda function receives chunks from the converse_stream API, it calls the AWS AppSync mutation operation, sending a partial token to the frontend to render.

To improve the user experience and reduce network overhead, the Lambda function doesn’t invoke the AWS AppSync mutation operation for every chunk it receives from the Amazon Bedrock converse_stream API. Instead, the Lambda code buffers partial tokens and invokes the AWS AppSync mutation operation after receiving five chunks. This avoids the overhead of AWS AppSync network calls, thereby reducing latency and improving the user experience.

After the Lambda function has finished sending the tokens, it sends DEFAULT_STREAM_END_TOKEN:

self.appsync.invoke_mutation(DEFAULT_STREAM_END_TOKEN)This token alerts the frontend that LLM streaming is complete.

For more details, refer to the GitHub repo. It contains a reference implementation of the Orchestrator Lambda function called bedrock-orchestrator-lambda.

Prerequisites

To deploy the solution, you must have the Terraform CLI installed in your environment. Complete all the steps in the Prerequisites section in the accompanying GitHub documentation.

Deploy the solution

Complete the following steps to deploy the solution:

Open a command line terminal window.
Change to the deployment folder.
Edit the sample.tfvars file. Replace the variable values to match your AWS environment.

region = “us-west-2”
lambda_s3_source_bucket_name = “YOUR_DEPLOYMENT_BUCKET”
lambda_s3_source_bucket_key = “PREFIX_WITHIN_THE_BUCKET”

Run the following commands to deploy the solution:

$ terraform init
$ terraform apply -var-file=”sample.tfvars”

Detailed deployment steps are in the Deploy the solution section in the accompanying GitHub repository.

Test the solution

To test the solution, use the provided sample web UI and run it inside VS Code. For more information, refer to accompanying README documentation.

Clean up

Use the following code to clean your AWS environment from the resources deployed in the previous section. You must use the same sample.tfvars that you used to deploy the solution.

$ terraform destroy -var-file=”sample.tfvars”

Conclusion

This post demonstrated how integrating an Amazon Bedrock streaming API with AWS AppSync subscriptions significantly enhances AI assistant responsiveness and user satisfaction. By implementing this streaming approach, the global financial services organization reduced initial response times for complex queries by approximately 75%—from 10 seconds to just 2–3 seconds—empowering users to view responses as they’re generated rather than waiting for complete answers. The business benefits are clear: reduced abandonment rates, improved user engagement, and a more responsive AI experience. Organizations can quickly implement this solution using the provided Lambda and Terraform code, quickly bringing these improvements to their own environments.

For even greater flexibility, AWS AppSync Events offers an alternative implementation pattern that can further enhance real-time capabilities using a fully managed WebSocket API. By addressing the fundamental tension between comprehensive AI responses and speed, this streaming approach enables organizations to maintain high-quality interactions while delivering the responsive experience modern users expect.

About the authors

Salman Moghal, a Principal Consultant at AWS Professional Services Canada, specializes in crafting secure generative AI solutions for enterprises. With extensive experience in full-stack development, he excels in transforming complex technical challenges into practical business outcomes across banking, finance, and insurance sectors. In his downtime, he enjoys racquet sports and practicing Funakoshi Genshin’s teachings at his martial arts dojo.

Philippe Duplessis-Guindon is a cloud consultant at AWS, where he has worked on a wide range of generative AI projects. He has touched on most aspects of these projects, from infrastructure and DevOps to software development and AI/ML. After earning his bachelor’s degree in software engineering and a master’s in computer vision and machine learning from Polytechnique Montreal, Philippe joined AWS to put his expertise to work for customers. When he’s not at work, you’re likely to find Philippe outdoors—either rock climbing or going for a run.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAI-generated child sexual abuse videos surging online, watchdog says | Internet
Next Article Google Rejected Perplexity’s Request To Become A Default Search Engine Option On Chrome: AI Startup Fires Back With ‘Comet’ Browser – Apple (NASDAQ:AAPL), Amazon.com (NASDAQ:AMZN)
Advanced AI Editor
  • Website

Related Posts

Scale generative AI use cases, Part 1: Multi-tenant hub and spoke architecture using AWS Transit Gateway

July 10, 2025

Configure fine-grained access to Amazon Bedrock models using Amazon SageMaker Unified Studio

July 10, 2025

Unlock retail intelligence by transforming data into actionable insights using generative AI with Amazon Q Business

July 10, 2025

Comments are closed.

Latest Posts

Is the Summer Group Show Dead or are Galleries Are Getting Smarter?

Supreme Court Greenlights Mass Layoffs of Federal Workers Under Trump

Adam Lindemann to Close Venus Over Manhattan After 14 Years

Ed Sheeran Is Ripping Off Jackson Pollock with His Paintings

Latest Posts

Snapsheet And Foundation AI To Enhance Claims Document Management With New Product

July 10, 2025

Introducing Skiniglow-6 Matrix™ Collection from the Cohere Beauty Innovation Collaborative

July 10, 2025

Information concerning the total number of voting rights and shares in the share capital as of 31 january 2025.

July 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Snapsheet And Foundation AI To Enhance Claims Document Management With New Product
  • Introducing Skiniglow-6 Matrix™ Collection from the Cohere Beauty Innovation Collaborative
  • Information concerning the total number of voting rights and shares in the share capital as of 31 january 2025.
  • Paper page – Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
  • DeepSeek, Unitree Robotics host head of China’s state-owned firm regulator on Zhejiang tour

Recent Comments

  1. "oppna binance-konto on Trump crypto czar Sacks stablecoin bill unlock trillions for Treasury
  2. Account binance on itel debuts CITY series with CITY 100 new model: A stylish, durable & DeepSeek AI-powered smartphone for Gen Z

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.