Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Nexl Bags $23m, Will Invest In Hires + Acquisitions – Artificial Lawyer

ASPO: Asymmetric Importance Sampling Policy Optimization – Takara TLDR

Vxceed builds the perfect sales pitch for sales teams at scale using Amazon Bedrock

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Implement a secure MLOps platform based on Terraform and GitHub

By Advanced AI EditorOctober 8, 2025No Comments11 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Machine learning operations (MLOps) is the combination of people, processes, and technology to productionize ML use cases efficiently. To achieve this, enterprise customers must develop MLOps platforms to support reproducibility, robustness, and end-to-end observability of the ML use case’s lifecycle. Those platforms are based on a multi-account setup by adopting strict security constraints, development best practices such as automatic deployment using continuous integration and delivery (CI/CD) technologies, and permitting users to interact only by committing changes to code repositories. For more information about MLOps best practices, refer to the MLOps foundation roadmap for enterprises with Amazon SageMaker.

Terraform by HashiCorp has been embraced by many customers as the main infrastructure as code (IaC) approach to develop, build, deploy, and standardize AWS infrastructure for multi-cloud solutions. Furthermore, development repositories and CI/CD technologies such as GitHub and GitHub Actions, respectively, have been adopted widely by the DevOps and MLOps community across the world.

In this post, we show how to implement an MLOps platform based on Terraform using GitHub and GitHub Actions for the automatic deployment of ML use cases. Specifically, we deep dive on the necessary infrastructure and show you how to utilize custom Amazon SageMaker Projects templates, which contain example repositories that help data scientists and ML engineers deploy ML services (such as an Amazon SageMaker endpoint or batch transform job) using Terraform. You can find the source code in the following GitHub repository.

Solution overview

The MLOps architecture solution creates the necessary resources to build a comprehensive training pipeline, registering the models in the Amazon SageMaker Model Registry, and its deployment to preproduction and production environments. This foundational infrastructure enables a systematic approach to ML operations, providing a robust framework that streamlines the journey from model development to deployment.

The end-users (data scientists or ML engineers) will select the organization SageMaker Project template that fits their use case. SageMaker Projects helps organizations set up and standardize developer environments for data scientists and CI/CD systems for MLOps engineers. The project deployment creates, from the GitHub templates, a GitHub private repository and CI/CD resources that data scientists can customize according to their use case. Depending on the chosen SageMaker project, other project-specific resources will also be created.

Complete MLOps workflow showing GitHub source, SageMaker pipeline stages, approval gates, and production deployment with monitoring

Custom SageMaker Project template

SageMaker projects deploys the associated AWS CloudFormation template of the AWS Service Catalog product to provision and manage the infrastructure and resources required for your project, including the integration with a source code repository.

At the time of writing, four custom SageMaker Projects templates are available for this solution:

MLOps template for LLM training and evaluation – An MLOps pattern that shows a simple one-account Amazon SageMaker Pipelines setup for large language models (LLMs) This template supports fine-tuning and evaluation.
MLOps template for model building and training – An MLOps pattern that shows a simple one-account SageMaker Pipelines setup. This template supports model training and evaluation.
MLOps template for model building, training, and deployment – An MLOps pattern to train models using SageMaker Pipelines and deploy the trained model into preproduction and production accounts. This template supports real-time inference, batch inference pipelines, and bring-your-own-containers (BYOC).
MLOps template for promoting the full ML pipeline across environments – An MLOps pattern to show how to take the same SageMaker pipeline across environments from dev to prod. This template supports a pipeline for batch inference.

Each SageMaker project template has associated GitHub repository templates that are cloned to be used for your use case:

SageMaker project creation UI displaying MLOps templates for model lifecycle automation, with associated Git repository types

When a custom SageMaker project is deployed by a data scientist, the associated GitHub template repositories are cloned through an invocation of the AWS Lambda function _clone_repo_lambda, which creates a new GitHub repository for your project.

Multi-project deployment architecture showing how shared GitHub templates propagate through AWS dev accounts to create standardized project structures

Infrastructure Terraform modules

The Terraform code, found under base-infrastructure/terraform, is structured with reusable modules that are used across different deployment environments. Their instantiation will be found for each environment under base-infrastructure/terraform//main.tf. There are seven key reusable modules:

There are also some environment-specific resources, which can be found directly under base-infrastructure/terraform/.

Enterprise AWS ML platform architecture with segregated VPCs, role-based access controls, and service connections for Dev/Pre-Prod/Prod environments

Prerequisites

Before you start the deployment process, complete the following three steps:

Prepare AWS accounts to deploy the platform. We recommend using three AWS accounts for three typical MLOps environments: experimentation, preproduction, and production. However, you can deploy the infrastructure to just one account for testing purposes.
Create a GitHub organization.
Create a personal access token (PAT). It is recommended to create a service or platform account and use its PAT.

Bootstrap your AWS accounts for GitHub and Terraform

Before we can deploy the infrastructure, the AWS accounts you have vended need to be bootstrapped. This is required so that Terraform can manage the state of the resources deployed. Terraform backends enable secure, collaborative, and scalable infrastructure management by streamlining version control, locking, and centralized state storage. Therefore, we deploy an S3 bucket and Amazon DynamoDB table for storing states and locking consistency checking.

Bootstrapping is also required so that GitHub can assume a deployment role in your account, therefore we deploy an IAM role and OpenID Connect (OIDC) identity provider (IdP). As an alternative to employing long-lived IAM user access keys, organizations can implement an OIDC IdP within your AWS account. This configuration facilitates the utilization of IAM roles and short-term credentials, enhancing security and adherence to best practices.

You can choose from two options to bootstrap your account: a bootstrap.sh Bash script and a bootstrap.yaml CloudFormation template, both stored at the root of the repository.

Bootstrap using a CloudFormation template

Complete the following steps to use the CloudFormation template:

Make sure the AWS Command Line Interface (AWS CLI) is installed and credentials are loaded for the target account that you want to bootstrap.
Identify the following:

Environment type of the account: dev, preprod, or prod.
Name of your GitHub organization.
(Optional) Customize the S3 bucket name for Terraform state files by choosing a prefix.
(Optional) Customize the DynamoDB table name for state locking.

Run the following command, updating the details from Step 2:

# Update
export ENV=xxx
export GITHUB_ORG=xxx
# Optional
export TerraformStateBucketPrefix=terraform-state
export TerraformStateLockTableName=terraform-state-locks

aws cloudformation create-stack \
–stack-name YourStackName \
–template-body file://bootstrap.yaml \
–capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
–parameters ParameterKey=Environment,ParameterValue=$ENV \
ParameterKey=GitHubOrg,ParameterValue=$GITHUB_ORG \
ParameterKey=OIDCProviderArn,ParameterValue=”” \
ParameterKey=TerraformStateBucketPrefix,ParameterValue=$TerraformStateBucketPrefix \
ParameterKey=TerraformStateLockTableName,ParameterValue=$TerraformStateLockTableName

Bootstrap using a Bash script

Complete the following steps to use the Bash script:

Make sure the AWS CLI is installed and credentials are loaded for the target account that you want to bootstrap.
Identify the following:

Environment type of the account: dev, preprod, or prod.
Name of your GitHub organization.
(Optional) Customize the S3 bucket name for Terraform state files by choosing a prefix.
(Optional) Customize the DynamoDB table name for state locking.

Run the script (bash ./bootstrap.sh) and input the details from Step 2 when prompted. You can leave most of these options as default.

If you change the TerraformStateBucketPrefix or TerraformStateLockTableName parameters, you must update the environment variables (S3_PREFIX and DYNAMODB_PREFIX) in the deploy.yml file to match.

Set up your GitHub organization

In the final step before infrastructure deployment, you must configure your GitHub organization by cloning code from this example into specific locations.

Base infrastructure

Create a new repository in your organization that will contain the base infrastructure Terraform code. Give your repository a unique name, and move the code from this example’s base-infrastructure folder into your newly created repository. Make sure the .github folder is also moved to the new repository, which stores the GitHub Actions workflow definitions. GitHub Actions make it possible to automate, customize, and execute your software development workflows right in your repository. In this example, we use GitHub Actions as our preferred CI/CD tooling.

Next, set up some GitHub secrets in your repository. Secrets are variables that you create in an organization, repository, or repository environment. The secrets that you create are available to use in our GitHub Actions workflows. Complete the following steps to create your secrets:

Navigation to the base infrastructure repository.
Choose Settings, Secrets and Variables, and Actions.
Create two secrets:

AWS_ASSUME_ROLE_NAME – This is created in the bootstrap script with the default name aws-github-oidc-role, and should be updated in the secret with whichever role name you choose.
PAT_GITHUB – This is your GitHub PAT token, created in the prerequisite steps.

Template repositories

The template-repos folder of our example contains multiple folders with the seed code for our SageMaker Projects templates. Each folder should be added to your GitHub organization as a private template repository. Complete the following steps:

Create the repository with the same name as the example folder, for every folder in the template-repos directory.
Choose Settings in each newly created repository.
Select the Private Template option.

Make sure you move all the code from the example folder to your private template, including the .github folder.

Update the configuration file

At the root of the base infrastructure folder is a config.json file. This file enables the multi-account, multi-environment mechanism. The example JSON structure is as follows:

{
“environment_name”: {
“region”: “X”,
“dev_account_number”: “XXXXXXXXXXXX”,
“preprod_account_number”: “XXXXXXXXXXXX”,
“prod_account_number”: “XXXXXXXXXXXX”
}
}

For your MLOps environment, simply change the name of environment_name to your desired name, and update the AWS Region and account numbers accordingly. Note the account numbers will correspond to the AWS accounts you bootstrapped. This config.json permits you to vend as many MLOps platforms as you desire. To do so, simply create a new JSON object in the file with the respective environment name, Region, and bootstrapped account numbers. Then locate the GitHub Actions deployment workflow under .github/workflows/deploy.yaml and add your new environment name inside each list object in the matrix key. When we deploy our infrastructure using GitHub Actions, we use a matrix deployment to deploy to all our environments in parallel.

Deploy the infrastructure

Now that you have set up your GitHub organization, you’re ready to deploy the infrastructure into the AWS accounts. Changes to the infrastructure will deploy automatically when changes are made to the main branch, therefore when you make changes to the config file, this should trigger the infrastructure deployment. To launch your first deployment manually, complete the following steps:

Navigate to your base infrastructure repository.
Choose the Actions tab.
Choose Deploy Infrastructure.
Choose Run Workflow and choose your desired branch for deployment.

This will launch the GitHub Actions workflow for deploying the experimentation, preproduction, and production infrastructure in parallel. You can visualize these deployments on the Actions tab.

Now your AWS accounts will contain the necessary infrastructure for your MLOps platform.

End-user experience

The following demonstration illustrates the end-user experience.

Clean up

To delete the multi-account infrastructure created by this example and avoid further charges, complete the following steps:

In the development AWS account, manually delete the SageMaker projects, SageMaker domain, SageMaker user profiles, Amazon Elastic File Service (Amazon EFS) storage, and AWS security groups created by SageMaker.
In the development AWS account, you might need to provide additional permissions to the launch_constraint_role IAM role. This IAM role is used as a launch constraint. Service Catalog will use this permission to delete the provisioned products.
In the development AWS account, manually delete the resources like repositories (Git), pipelines, experiments, model groups, and endpoints created by SageMaker Projects.
For preproduction and production AWS accounts, manually delete the S3 bucket ml-artifacts– and the model deployed through the pipeline.
After you complete these changes, trigger the GitHub workflow for destroying.
If the resources aren’t deleted, manually delete the pending resources.
Delete the IAM user that you created for GitHub Actions.
Delete the secret in AWS Secrets Manager that stores the GitHub personal access token.

Conclusion

In this post, we walked through the process of deploying an MLOps platform based on Terraform and using GitHub and GitHub Actions for the automatic deployment of ML use cases. This solution effectively integrates four custom SageMaker Projects templates for model building, training, evaluation and deployment with specific SageMaker pipelines. In our scenario, we focused on deploying a multi-account and multi-environment MLOps platform. For a comprehensive understanding of the implementation details, visit the GitHub repository.

About the authors

Author picture: Jordan GrubbJordan Grubb is a DevOps Architect at AWS, specializing in MLOps. He enables AWS customers to achieve their business outcomes by delivering automated, scalable, and secure cloud architectures. Jordan is also an inventor, with two patents within software engineering. Outside of work, he enjoys playing most sports, traveling, and has a passion for health and wellness.

Author picture: Irene Arroyo DelgadoIrene Arroyo Delgado is an AI/ML and GenAI Specialist Solution at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads, to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleDeepSeek AI Tips Remittix As The Best Crypto To Buy Now
Next Article Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR
Advanced AI Editor
  • Website

Related Posts

Vxceed builds the perfect sales pitch for sales teams at scale using Amazon Bedrock

October 8, 2025

Implement automated monitoring for Amazon Bedrock batch inference

October 7, 2025

Automate Amazon QuickSight data stories creation with agentic AI using Amazon Nova Act

October 7, 2025

Comments are closed.

Latest Posts

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

Latest Posts

Nexl Bags $23m, Will Invest In Hires + Acquisitions – Artificial Lawyer

October 8, 2025

ASPO: Asymmetric Importance Sampling Policy Optimization – Takara TLDR

October 8, 2025

Vxceed builds the perfect sales pitch for sales teams at scale using Amazon Bedrock

October 8, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Nexl Bags $23m, Will Invest In Hires + Acquisitions – Artificial Lawyer
  • ASPO: Asymmetric Importance Sampling Policy Optimization – Takara TLDR
  • Vxceed builds the perfect sales pitch for sales teams at scale using Amazon Bedrock
  • Arcade Welcomes Varun Jampani as AI Chief to Build the Next Era of AI Creation and Commerce
  • The Future of Artificial Intelligence: 10 Predictions for 10 Industries

Recent Comments

  1. GregoryEffot on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. EarnestJoize on Reverse Engineering The IBM PC110, One PCB At A Time
  3. Lewiszix on OpenAI expects subscription revenue to nearly double to $10bn
  4. TimsothyReaws on [2503.10822] Rotated Bitboards and Reinforcement Learning in Computer Chess and Beyond
  5. Lewiszix on AI code suggestions sabotage software supply chain • The Register

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.