Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

IBM vs. Amazon: Which Cloud Infrastructure Stock Offers More Upside? – July 15, 2025

Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet

AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Supercharge generative AI workflows with NVIDIA DGX Cloud on AWS and Amazon Bedrock Custom Model Import

By Advanced AI EditorJuly 15, 2025No Comments12 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


This post is co-written with Andrew Liu, Chelsea Isaac, Zoey Zhang, and Charlie Huang from NVIDIA.

DGX Cloud on Amazon Web Services (AWS) represents a significant leap forward in democratizing access to high-performance AI infrastructure. By combining NVIDIA GPU expertise with AWS scalable cloud services, organizations can accelerate their time-to-train, reduce operational complexity, and unlock new business opportunities. The platform’s performance, security, and flexibility position it as a foundational element for those seeking to stay at the forefront of AI innovation.

In this post, we explore a powerful end-to-end development workflow using NVIDIA DGX Cloud on AWS, Run:ai, and Amazon Bedrock Custom Model Import. We demonstrate how to fine-tune the open source Llama 3.1-70b model using NVIDIA DGX Cloud’s high performance multi-GPU compute orchestrated with Run:ai, and we deploy the fine-tuned model using Custom Model Import in Amazon Bedrock for scalable serverless inference.

NVIDIA DGX Cloud on AWS

Organizations aim for rapid deployment of generative AI and agentic AI solutions to gain business value quickly. AWS and NVIDIA have been partnering together to provide AI infrastructure, software, and services. The two companies have co-engineered NVIDIA DGX Cloud on AWS: a fully managed, high-performance AI training platform with flexible, short-term access to large GPU clusters. DGX Cloud on AWS is optimized for faster time to train at every layer of the full stack platform to deliver productivity from day one. With DGX Cloud on AWS, organizations can use the latest NVIDIA architectures, including Amazon EC2 P6e-GB200 UltraServer accelerated by NVIDIA Grace Blackwell GB200 Superchip (coming soon to DGX Cloud on AWS). DGX Cloud on AWS also includes access to NVIDIA AI and cloud experts, as well as 24/7 support, to help enterprises deliver maximum return on investment (ROI) and available in AWS Marketplace.

Amazon Bedrock Custom Model Import

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure. With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They can accelerate generative AI application development by using built-in tools and features such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and more—all through a unified and consistent developer experience.

NVIDIA DGX Cloud on AWS architecture overview

DGX Cloud is a fully managed platform from NVIDIA co-engineered with AWS for customers that need to train or fine-tune a model. DGX Cloud on AWS uses p5.48xlarge instances, each with 8 H100 GPUs, 8 x 3.84 TB NVMe storage, and 32 network interfaces, providing a total network bandwidth of 3200 Gbps. DGX Cloud on AWS organizes node instances into optimal layouts for artificial intelligence and machine learning (AI/ML) cluster workloads and placed in contiguous clusters, resulting in lower latency and faster results. DGX Cloud uses Amazon Elastic Kubernetes Service (Amazon EKS) and NVIDIA software such as NVIDIA NeMo and NVIDIA Run:ai to deploy and optimize Kubernetes clusters. Each cluster uses an Amazon FSx for Lustre file system for high-performance shared storage. DGX Cloud on AWS also uses Run:ai for workload orchestration, software as a service (SaaS) that provides intelligent workload scheduling, prioritization, and preemption to maximize GPU utilization.

The application plane, which includes the p5 instances and FSx for Lustre file system, operates as a single tenant dedicated to each customer, providing complete isolation and dedicated performance for AI workloads. In addition, DGX Cloud also offers two private access connectivity options for customers who want a secure and direct connection from the cluster to their own AWS account: private access with AWS PrivateLink and private access with AWS Transit Gateway. With private access with AWS PrivateLink, private links are set up with endpoints into a customer’s AWS account to connect to the Kubernetes API, Run:ai control plane, and for cluster ingress. With private access with AWS Transit Gateway, traffic into and out of the DGX Cloud cluster will go through a customer’s transit gateway. The Run:ai control plane will still be connected through a PrivateLink endpoint.

The following diagram illustrates the solution architecture.

Setting up NVIDIA DGX Cloud on AWS

After you get access to your DGX Cloud cluster on AWS, you can start setting up your cluster to run workloads. A cluster admin first needs to create departments and projects for users to run their workloads in. A default department can be provided when you get initial access to your cluster. Projects allow for additional granular quota management beyond the quota set at the department level. After departments and projects are set up, users can then use their allocated quota to run workloads.

The following figure illustrates the Run:ai interface in DGX Cloud on AWS.

In this example, an interactive Jupyter notebook workspace running the nvcr.io/nvidia/nemo:25.02 image is needed to preprocess and manage the code. You’ll need 8 GPUs and at least 20 TBs of mounted storage provided by Amazon FSx for Lustre. This should look like the following image. An Amazon Simple Storage Service (Amazon S3) bucket can also be mounted to directly connect your data to your Amazon account. To learn more about how to create a notebook with NVIDIA NeMo, refer to Interactive NeMo Workload Job.

Fine-tuning the Llama3 model on NVIDIA DGX Cloud

After your Jupyter notebook is created, you can access it and upload our example notebook to download the dataset and Hugging Face model. Using the terminal function, copy the code into your PersistentVolumeClaim (PVC) from the NVIDIA NeMo Run repo. After this is downloaded, to run the notebook, you’ll need a Hugging Face account and create a Hugging Face token with access to the Llama 3.1 70b model on Hugging Face. To use the NVIDIA NeMo framework, convert the Hugging Face tensors to the .nemo format. We’re fine-tuning this model to follow user generated instructions using the open source daring-anteater dataset. This dataset is focused on instruction tuning and covers a wide range of tasks and scenarios. When your data and model finish downloading, you’re ready to train your model.

The following figure illustrates the sample notebook to fine-tune Llama model in DGX Cloud on AWS.

Use NeMo-Run to launch the training job in the cluster. Four H100 nodes (EC2 P5 instances) with 8 GPUs each were used to fine-tune our model in this example. To launch this training, you need to create an application token and secret. After your training is launched, you can click the launched workload to look at its event history, metrics, and logs. The metrics will show the GPU compute and memory utilization. The logs for the master node will show the progress of the fine-tuning job.

The following figure illustrates the sample metrics in DGX Cloud on AWS.

When the model is finished pre-training, return to your Jupyter notebook to convert the model back to Hugging Face safetensors and move the model to Amazon S3. This requires your AWS access key and an S3 bucket. With the tensors and tokens moved, you’re ready to import this model using Amazon Bedrock Custom Model Import.

The following figure illustrates the sample Amazon S3 bucket.

Import your custom model to Amazon Bedrock

To import your custom model, follow these steps:

On the Amazon Bedrock console in the navigation pane, choose Foundation models and then choose Imported model.
In Model details, enter a name such as CustomModelName, as shown in the following screenshot.

For each custom model you import, you can supply an Import job name or use the identifier that is already supplied, which you can use to track and monitor the import of your model.
Scroll down to Model Import Settings, where you can create your custom model by importing the model weights from an S3 bucket or importing a model directly from Amazon SageMaker. For demonstration, you can import Meta’s Llama 3.1 70B model from Amazon S3 by choosing Browse S3 and navigating to your model files.

Verify your model, configuration, and tokenizer, and select any other files associated with your model.

The following figure illustrates the model import setting in Amazon Bedrock.

After you’ve selected your model files, you can choose to encrypt your model using a customer managed key by selecting Customize encryption settings and selecting your AWS Key Management Store (AWS KMS) key. By default, Amazon Bedrock encrypts custom models with AWS owned keys. You can’t view, manage, or use AWS owned keys or audit their use. However, you don’t have to take action or change programs to protect the keys that encrypt your data. Under Service access, you can choose to associate an AWS Identity and Access Management (IAM) role that you’ve created, but you can leave the default selected for Amazon Bedrock to create a default role for you.
When your settings are complete, choose Import model.

To monitor the progress of your importing job, choose Jobs in the Imported models section, as shown in the following screenshot.

After your model has been imported, it should be listed on the Models tab, as shown in the following screenshot.

Model inference using Amazon Bedrock

The Amazon Bedrock playgrounds are a tool in the AWS Management Console that provide a visual interface to experiment with running inference on different models and using different configurations. You can use the playgrounds to test different models and values before you integrate them into your application. The following steps demonstrate how to use the custom model that you imported into Amazon Bedrock and submit a prompt in the playground:

In the Amazon Bedrock navigation pane, choose Chat/text and then choose the Mode you wish to test.
Choose Select model and under Custom & managed endpoints, choose your model to test and choose Apply, as shown in the following screenshot.

With the model loaded into the playground, you can begin by sending your first prompt. Enter a description to create the request and choose Run.

The following screenshot shows a sample prompt to write an email to a wine expert, requesting a guest article contribution for your wine blog.

Clean up

Use the following steps to clean up the infrastructure created for this post and avoid incurring ongoing costs.

Delete the imported model:

aws bedrock delete-imported-model –model-identifier Demo-Mode

Delete the AWS KMS key created:

aws kms schedule-key-deletion –key-id

Conclusion

In this post, we discussed how NVIDIA DGX Cloud on AWS, combined with Amazon Bedrock Custom Model Import for scalable deployment, offers a powerful end-to-end solution for developing, fine-tuning, and operationalizing generative and agentic AI applications. This approach is particularly advantageous for organizations seeking to accelerate time to market, minimize operational overhead, and foster rapid innovation. Enterprise developers can start with NVIDIA DGX Cloud on AWS today. For more NVIDIA DGX Cloud recipes, check out the examples in dgxc-benchmarking GitHub repo.

Resources

About the authors

Vara Bonthu is a Principal Open Source Specialist SA leading Data on EKS and AI on EKS at AWS, driving open source initiatives and helping AWS customers to diverse organizations. He specializes in open source technologies, data analytics, AI/ML, and Kubernetes, with extensive experience in development, DevOps, and architecture. Vara focuses on building highly scalable data and AI/ML solutions on Kubernetes, enabling customers to maximize cutting-edge technology for their data-driven initiatives.

Chad Elias is a Senior Solutions Architect for AWS. He’s passionate about helping organizations modernize their infrastructure and applications through AI/ML solutions. When not designing the next generation of cloud architectures, Chad enjoys contributing to open source projects, mentoring junior engineers, and exploring the latest technologies.

Brian Kreitzer is a Partner Solutions Architect at Amazon Web Services (AWS). He is responsible for working with partners to create accelerators and solutions for AWS customers, engages in technical co-sell opportunities, and evangelizes accelerator and solution adoption to the technical community.

Timothy Ma is a Principal Specialist in generative AI at AWS, where he collaborates with customers to design and deploy cutting-edge machine learning solutions. He also leads go-to-market strategies for generative AI services, helping organizations harness the potential of advanced AI technologies.

Andrew Liu is the manager of the DGX Cloud Technical Marketing Engineering team, focusing on showcasing the use cases and capabilities of DGX Cloud by creating technical assets and collateral. His goal is to demonstrate how DGX Cloud empowers NVIDIA and the ecosystem to create world-class AI solutions. In his free time, Andrew enjoys being outdoors and going mountain biking and skiing.

Chelsea Isaac is a Senior Solutions Architect for DGX Cloud at NVIDIA. She’s passionate about helping enterprise customers and partners deploy and scale AI solutions in the cloud. In her free time, she enjoys working out, traveling, and reading.

Zoey Zhang is a Technical Marketing Engineer on DGX Cloud at NVIDIA. She works on integrating machine learning models into large-scale compute clusters on the cloud and uses her technical expertise to bring NVIDIA products to market.

Charlie Huang is a senior product marketing manager for Cloud AI at NVIDIA. Charlie is responsible for taking NVIDIA DGX Cloud to market with cloud partners. He has vast experience in AI/ML, cloud and data center solutions, virtualization, and security.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous Articlewhy the remaining copyright claims are of more than secondary significance
Next Article Paper page – LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Advanced AI Editor
  • Website

Related Posts

AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success

July 15, 2025

Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS

July 15, 2025

Build secure RAG applications with AWS serverless data lakes

July 14, 2025

Comments are closed.

Latest Posts

The Artists and Art Pros Who Donated to Cuomo and Mamdani’s Campaigns

Phillips Sues Billionaire’s Son Over $14.5 M. Pollock Painting

Murujuga Rock Art in Australia Receives UNESCO World Heritage Status

‘Earth Room’ Caretaker Dies at 70

Latest Posts

IBM vs. Amazon: Which Cloud Infrastructure Stock Offers More Upside? – July 15, 2025

July 15, 2025

Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet

July 15, 2025

AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success

July 15, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • IBM vs. Amazon: Which Cloud Infrastructure Stock Offers More Upside? – July 15, 2025
  • Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet
  • AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success
  • Varun Mohan Joins Google as Cognition Acquires Windsurf After OpenAI Deal Collapse
  • Former OpenAI CTO Mira Murati raises $2 billion for new AI startup Thinking Machines Lab – NBC New York

Recent Comments

  1. ⛏ Ticket- Operation 1,208189 BTC. Assure => https://graph.org/Payout-from-Blockchaincom-06-26?hs=53d5900f2f8db595bea7d1d205d9c375& ⛏ on Were RNNs All We Needed? (Paper Explained)
  2. 📗 + 1.333023 BTC.NEXT - https://graph.org/Payout-from-Blockchaincom-06-26?hs=ec6999251b5fd7a82cd3e6db8f19412e& 📗 on OpenAI is pushing for industry-specific AI benchmarks – why that matters
  3. 📏 + 1.602160 BTC.NEXT - https://graph.org/Payout-from-Blockchaincom-06-26?hs=68a63a7dd7346634ec406c95aa051292& 📏 on [News] Soccer AI FAILS and mixes up ball and referee’s bald head.
  4. 🖱 Reminder; + 1.859736 bitcoin. Get >>> https://graph.org/Payout-from-Blockchaincom-06-26?hs=ea1f0b9078972b08ef3081fd29f37328& 🖱 on Meta has revenue sharing agreements with Llama AI model hosts, filing reveals
  5. 🔐 Email: TRANSFER 1,339860 BTC. Assure => https://graph.org/Payout-from-Blockchaincom-06-26?hs=636969481c537edfddb345a6023fc080& 🔐 on James Cameron Wants to Use AI to ‘Cut the Cost’ of Making Films

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.