Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Introducing auto scaling on Amazon SageMaker HyperPod

By Advanced AI EditorAugust 29, 2025No Comments13 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Today, we’re excited to announce that Amazon SageMaker HyperPod now supports managed node automatic scaling with Karpenter, so you can efficiently scale your SageMaker HyperPod clusters to meet your inference and training demands. Real-time inference workloads require automatic scaling to address unpredictable traffic patterns and maintain service level agreements (SLAs). As demand spikes, organizations must rapidly adapt their GPU compute without compromising response times or cost-efficiency. Unlike self-managed Karpenter deployments, this service-managed solution alleviates the operational overhead of installing, configuring, and maintaining Karpenter controllers, while providing tighter integration with the resilience capabilities of SageMaker HyperPod. This managed approach supports scale to zero, reducing the need for dedicated compute resources to run the Karpenter controller itself, improving cost-efficiency.

SageMaker HyperPod offers a resilient, high-performance infrastructure, observability, and tooling optimized for large-scale model training and deployment. Companies like Perplexity, HippocraticAI, H.AI, and Articul8 are already using SageMaker HyperPod for training and deploying models. As more customers transition from training foundation models (FMs) to running inference at scale, they require the ability to automatically scale their GPU nodes to handle real production traffic by scaling up during high demand and scaling down during periods of lower utilization. This capability necessitates a powerful cluster auto scaler. Karpenter, an open source Kubernetes node lifecycle manager created by AWS, is a popular choice among Kubernetes users for cluster auto scaling due to its powerful capabilities that optimize scaling times and reduce costs.

This launch provides a managed Karpenter-based solution for automatic scaling that is installed and maintained by SageMaker HyperPod, removing the undifferentiated heavy lifting of setup and management from customers. The feature is available for SageMaker HyperPod EKS clusters, and you can enable auto scaling to transform your SageMaker HyperPod cluster from static capacity to a dynamic, cost-optimized infrastructure that scales with demand. This combines Karpenter’s proven node lifecycle management with the purpose-built and resilient infrastructure of SageMaker HyperPod, designed for large-scale machine learning (ML) workloads. In this post, we dive into the benefits of Karpenter, and provide details on enabling and configuring Karpenter in your SageMaker HyperPod EKS clusters.

New features and benefits

Karpenter-based auto scaling in your SageMaker HyperPod clusters provides the following capabilities:

Service managed lifecycle – SageMaker HyperPod handles Karpenter installation, updates, and maintenance, alleviating operational overhead
Just-in-time provisioning – Karpenter observes your pending pods and provisions the required compute for your workloads from an on-demand pool
Scale to zero – You can scale down to zero nodes without maintaining dedicated controller infrastructure
Workload-aware node selection – Karpenter chooses optimal instance types based on pod requirements, Availability Zones, and pricing to minimize costs
Automatic node consolidation – Karpenter regularly evaluates clusters for optimization opportunities, shifting workloads to avoid underutilized nodes
Integrated resilience – Karpenter uses the built-in fault tolerance and node recovery mechanisms of SageMaker HyperPod

These capabilities are built on top of recently launched continuous provisioning capabilities, which enables SageMaker HyperPod to automatically provision remaining capacity in the background while workloads start immediately on available instances. When node provisioning encounters failures due to capacity constraints or other issues, SageMaker HyperPod automatically retries in the background until clusters reach their desired scale, so your auto scaling operations remain resilient and non-blocking.

Solution overview

The following diagram illustrates the solution architecture.

Karpenter works as a controller in the cluster and operates in the following steps:

Watching – Karpenter watches for un-schedulable pods in the cluster through the Kubernetes API server. These could be pods that go into pending state when deployed or automatically scaled to increase the replica count.
Evaluating – When Karpenter finds such pods, it computes the shape and size of a NodeClaim to fit the set of pods requirements (GPU, CPU, memory) and topology constraints, and checks if it can pair them with an existing NodePool. For each NodePool, it queries the SageMaker HyperPod APIs to get the instance types supported by the NodePool. It uses the information about instance type metadata (hardware requirements, zone, capacity type) to find a matching NodePool.
Provisioning – If Karpenter finds a matching NodePool, it creates a NodeClaim and tries to provision a new instance to be used as the new node. Karpenter internally uses the sagemaker:UpdateCluster API to increase the capacity of the selected instance group.
Disrupting – Karpenter periodically checks if a new node is needed or not. If it’s not needed, Karpenter deletes it, which internally translates to a delete node request to the SageMaker HyperPod cluster.

Prerequisites

Verify you have the required quotas for the instances you will create in the SageMaker HyperPod cluster. To review your quotas, on the Service Quotas console, choose AWS services in the navigation pane, then choose SageMaker. For example, the following screenshot shows the available quota for g5.12xlarge instances (three).

To update the cluster, you must first create AWS Identity and Access Management (IAM) permissions for Karpenter. For instructions, see Create an IAM role for HyperPod autoscaling with Karpenter.

Create and configure a SageMaker HyperPod cluster

To begin, launch and configure your SageMaker HyperPod EKS cluster and verify that continuous provisioning mode is enabled on cluster creation. Complete the following steps:

On the SageMaker AI console, choose HyperPod clusters in the navigation pane.
Choose Create HyperPod cluster and Orchestrated on Amazon EKS.
For Setup options, select Custom setup.
For Name, enter a name.
For Instance recovery, select Automatic.
For Instance provisioning mode, select Use continuous provisioning.
Choose Submit.

This setup creates the necessary configuration such as virtual private cloud (VPC), subnets, security groups, and EKS cluster, and installs operators in the cluster. You can also provide existing resources such as an EKS cluster if you want to use an existing cluster instead of creating a new one. This setup will take around 20 minutes.

Verify that each InstanceGroup is limited to one zone by opting for the OverrideVpcConfig and selecting only one subnet per each InstanceGroup.

After you create the cluster, you must update it to enable Karpenter. You can do this using Boto3 or the AWS Command Line Interface (AWS CLI) using the UpdateCluster API command (after configuring the AWS CLI to connect to your AWS account).

The following code uses Python Boto3:

import boto3
client = boto3.client(‘sagemaker’)
response = client.update_cluster(
ClusterName=,
AutoScaling = { “Mode”: “Enable”, “AutoScalerType”: “Karpenter” },
ClusterRole = ,
)

The following code uses the AWS CLI:

aws sagemaker update-cluster \
    –cluster-name  \
    –auto-scaling ‘{ “Mode”: “Enable”, “AutoScalerType”: “Karpenter” }` \
    –cluster-role

After you run this command and update the cluster, you can verify that Karpenter has been enabled by running the DescribeCluster API.

The following code uses Python:

import boto3
client = boto3.client(‘sagemaker’)
print(sagemaker_client.describe_cluster(ClusterName=).get(“AutoScaling”))

The following code uses the AWS CLI:

aws sagemaker describe-cluster –cluster-name –query AutoScaling

The following code shows our output:

{‘Mode’: ‘Enable’,
 ‘AutoScalerType’: ‘Karpenter’,
 ‘Status’: ‘Enabled’}

Now you have a working cluster. The next step is to set up some custom resources in your cluster for Karpenter.

Create HyperpodNodeClass

HyperpodNodeClass is a custom resource that maps to pre-created instance groups in SageMaker HyperPod, defining constraints around which instance types and Availability Zones are supported for Karpenter’s auto scaling decisions. To use HyperpodNodeClass, simply specify the names of the InstanceGroups of your SageMaker HyperPod cluster that you want to use as the source for the AWS compute resources to use to scale up your pods in your NodePools.

The HyperpodNodeClass name that you use here is carried over to the NodePool in the next section where you reference it. This tells the NodePool which HyperpodNodeClass to draw resources from. To create a HyperpodNodeClass, complete the following steps:

Create a YAML file (for example, nodeclass.yaml) similar to the following code. Add InstanceGroup names that you used at the time of the SageMaker HyperPod cluster creation. You can also add new instance groups to an existing SageMaker HyperPod EKS cluster.
Reference the HyperPodNodeClass name in your NodePool configuration.

The following is a sample HyperpodNodeClass that uses ml.g6.xlarge and ml.g6.4xlarge instance types:

apiVersion: karpenter.sagemaker.amazonaws.com/v1
kind: HyperpodNodeClass
metadata:
  name: multiazg6
spec:
  instanceGroups:
    # name of InstanceGroup in HyperPod cluster. InstanceGroup needs to pre-created
    # before this step can be completed.
    # MaxItems: 10
    – auto-g6-az1
    – auto-g6-4xaz2

Apply the configuration to your EKS cluster using kubectl:

kubectl apply -f nodeclass.yaml

Monitor the HyperpodNodeClass status to verify the Ready condition in status is set to True to ensure it was successfully created:

kubectl get hyperpodnodeclass multiazc5 -oyaml

The SageMaker HyperPod cluster must have AutoScaling enabled and the AutoScaling status must change to InService before the HyperpodNodeClass can be applied.

For more information and key considerations, see Autoscaling on SageMaker HyperPod EKS.

Create NodePool

The NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes. The NodePool can be set to perform various actions, such as:

Define labels and taints to limit the pods that can run on nodes Karpenter creates
Limit node creation to certain zones, instance types, and computer architectures, and so on

For more information about NodePool, refer to NodePools. SageMaker HyperPod managed Karpenter supports a limited set of well-known Kubernetes and Karpenter requirements, which we explain in this post.

To create a NodePool, complete the following steps:

Create a YAML file named nodepool.yaml with your desired NodePool configuration.

The following code is a sample configuration to create a sample NodePool. We specify the NodePool to include our ml.g6.xlarge SageMaker instance type, and we additionally specify it for one zone. Refer to NodePools for more customizations.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
 name: gpunodepool
spec:
 template:
   spec:
     nodeClassRef:
      group: karpenter.sagemaker.amazonaws.com
      kind: HyperpodNodeClass
      name: multiazg6
     expireAfter: Never
     requirements:
        – key: node.kubernetes.io/instance-type
          operator: Exists
        – key: “node.kubernetes.io/instance-type”
          operator: In
          values: [“ml.g6.xlarge”]
        – key: “topology.kubernetes.io/zone”
          operator: In
          values: [“us-west-2a”]

Apply the NodePool to your cluster:

kubectl apply -f nodepool.yaml

Monitor the NodePool status to ensure the Ready condition in the status is set to True:

kubectl get nodepool gpunodepool -oyaml

This example shows how a NodePool can be used to specify the hardware (instance type) and placement (Availability Zone) for pods.

Launch a simple workload

The following workload runs a Kubernetes deployment where the pods in deployment are requesting for 1 CPU and 256 MB memory per replica, per pod. The pods have not been spun up yet.

kubectl apply -f https://raw.githubusercontent.com/aws/karpenter-provider-aws/refs/heads/main/examples/workloads/inflate.yaml

When we apply this, we can see a deployment and a single node launch in our cluster, as shown in the following screenshot.

To scale this component, use the following command:

kubectl scale deployment inflate –replicas 10

Within a few minutes, we can see Karpenter add the requested nodes to the cluster.

Implement advanced auto scaling for inference with KEDA and Karpenter

To implement an end-to-end auto scaling solution on SageMaker HyperPod, you can set up Kubernetes Event-driven Autoscaling (KEDA) along with Karpenter. KEDA enables pod-level auto scaling based on a wide range of metrics, including Amazon CloudWatch metrics, Amazon Simple Queue Service (Amazon SQS) queue lengths, Prometheus queries, and resource utilization patterns. By configuring Keda ScaledObject resources to target your model deployments, KEDA can dynamically adjust the number of inference pods based on real-time demand signals.

When integrating KEDA and Karpenter, this combination creates a powerful two-tier auto scaling architecture. As KEDA scales your pods up or down based on workload metrics, Karpenter automatically provisions or deletes nodes in response to changing resource requirements. This integration delivers optimal performance while controlling costs by making sure your cluster has precisely the right amount of compute resources available at all times. For effective implementation, consider the following key factors:

Set appropriate buffer thresholds in KEDA to accommodate Karpenter’s node provisioning time
Configure cooldown periods carefully to prevent scaling oscillations
Define clear resource requests and limits to help Karpenter make optimal node selections
Create specialized NodePools tailored to specific workload characteristics

The following is a sample spec of a KEDA ScaledObject file that scales the number of pods based on CloudWatch metrics of Application Load Balancer (ALB) request count:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nd-deepseek-llm-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: nd-deepseek-llm-r1-distill-qwen-1-5b
    apiVersion: apps/v1
    kind: Deployment
  minReplicaCount: 1
  maxReplicaCount: 3
  pollingInterval: 30     # seconds between checks
  cooldownPeriod: 300     # seconds before scaling down
  triggers:
    – type: aws-cloudwatch
      metadata:
        namespace: AWS/ApplicationELB        # or your metric namespace
        metricName: RequestCount              # or your metric name
        dimensionName: LoadBalancer           # or your dimension key
        dimensionValue: app/k8s-default-albnddee-cc02b67f20/0991dc457b6e8447
        statistic: Sum
        threshold: “3”                        # change to your desired threshold
        minMetricValue: “0”                   # optional floor
        region: us-east-2                     # your AWS region
        identityOwner: operator               # use the IRSA SA bound to keda-operator

Clean up

To clean up your resources to avoid incurring more charges, delete your SageMaker HyperPod cluster.

Conclusion

With the launch of Karpenter node auto scaling on SageMaker HyperPod, ML workloads can automatically adapt to changing workload requirements, optimize resource utilization, and help control costs by scaling precisely when needed. You can also integrate it with event-driven pod auto scalers such as KEDA to scale based on custom metrics.

To experience these benefits for your ML workloads, enable Karpenter in your SageMaker HyperPod clusters. For detailed implementation guidance and best practices, refer to Autoscaling on SageMaker HyperPod EKS.

About the authors

Vivek Gangasani is a Worldwide Lead GenAI Specialist Solutions Architect for SageMaker Inference. He drives Go-to-Market (GTM) and Outbound Product strategy for SageMaker Inference. He also helps enterprises and startups deploy, manage, and scale their GenAI models with SageMaker and GPUs. Currently, he is focused on developing strategies and content for optimizing inference performance and GPU efficiency for hosting Large Language Models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

Adam Stanley is a Solution Architect for Software, Internet and Model Provider customers at Amazon Web Services (AWS). He supports customers adopting all AWS services, but focuses primarily on Machine Learning training and inference infrastructure. Prior to AWS, Adam went to the University of New South Wales and graduated with degrees in Mathematics and Accounting. You can connect with him on LinkedIn.

Kunal Jha is a Principal Product Manager at AWS, where he focuses on building Amazon SageMaker HyperPod to enable scalable distributed training and fine-tuning of foundation models. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can connect with him on LinkedIn.

Ty Bergstrom is a Software Engineer at Amazon Web Services. He works on the HyperPod Clusters platform for Amazon SageMaker.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAnthropic warns that its Claude AI is being ‘weaponized’ by hackers to write malicious code
Next Article OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models – Takara TLDR
Advanced AI Editor
  • Website

Related Posts

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

August 29, 2025

Detect Amazon Bedrock misconfigurations with Datadog Cloud Security

August 29, 2025

How Amazon Finance built an AI assistant using Amazon Bedrock and Amazon Kendra to support analysts for data discovery and business insights

August 29, 2025

Comments are closed.

Latest Posts

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Australian School Faces Pushback over AI Art Course—and More Art News

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Latest Posts

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR

August 29, 2025

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

August 29, 2025

Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals

August 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR
  • Set up custom domain names for Amazon Bedrock AgentCore Runtime agents
  • Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals
  • How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy
  • Recruit with Emotional Intelligence | Recruiting News Network

Recent Comments

  1. Darrylutted on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. tadalafil 5 mg how long does it last on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. RussellFef on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. pirinç küpeşte modelleri on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Darrylutted on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.