Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

IBM and NASA Trained the First Foundational Model for Heliophysics

Kaplan Fox Alerts Investors of C3.ai, Inc. (AI) to a Securities Class Action Deadline on October 21, 2025

Baden-Württemberg to be first German state using AI in administration

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

GPT OSS models from OpenAI are now available on SageMaker JumpStart

By Advanced AI EditorAugust 6, 2025No Comments9 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Today, we are excited to announce the availability of Open AI’s new open weight GPT OSS models, gpt-oss-120b and gpt-oss-20b, from OpenAI in Amazon SageMaker JumpStart. With this launch, you can now deploy OpenAI’s newest reasoning models to build, experiment, and responsibly scale your generative AI ideas on AWS.

In this post, we demonstrate how to get started with these models on SageMaker JumpStart.

Solution overview

The OpenAI GPT OSS models (gpt-oss-120b and gpt-oss-20b) excel at coding, scientific analysis, and mathematical reasoning tasks. Both models feature a 128K context window and adjustable reasoning levels (low/medium/high) to match specific requirements. They support external tool integration and can be used in agentic workflows through frameworks like Strands Agents, an open source AI agent SDK. With full chain-of-thought output capabilities, you get detailed visibility into the model’s reasoning process. You can use the OpenAI SDK to call your SageMaker endpoint directly by simply updating the endpoint. The models give you the flexibility to modify and customize them for your specific business needs while benefiting from enterprise-grade security and seamless scaling.

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models (FMs) for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy, accelerating the development and deployment of machine learning (ML) applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as OpenAI, for a variety of tasks.

You can now discover and deploy OpenAI models in Amazon SageMaker Studio or programmatically through the Amazon SageMaker Python SDK, to derive model performance and MLOps controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The models are deployed in a secure AWS environment and under your VPC controls, helping to support data security for enterprise security needs.

You can discover GPT OSS models from US East (Ohio, N. Virginia) and Asia Pacific (Mumbai, Tokyo) AWS Regions.

Throughout this example, we use the gpt-oss-120b model. These steps can be replicated with the gpt-oss-20b model as well.

Prerequisites

To deploy the GPT OSS models, you must have the following prerequisites:

An AWS account that will contain your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see AWS Identity and Access Management for Amazon SageMaker AI.
Access to SageMaker Studio, a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
To deploy GPT OSS models, make sure you have access to the recommended instance types based on the model size. You can find these instance recommendations on the SageMaker JumpStart model card. The default instance type for both these models is p5.48xlarge, but you can also use other P5 family instances where available. To verify you have the necessary service quotas, complete the following steps:

On the Service Quotas console, under AWS Services, choose Amazon SageMaker.
Check that you have sufficient quota for the required instance type for endpoint deployment.
Make sure at least one of these instance types is available in your target Region.
If needed, request a quota increase and contact your AWS account team for support.

Deploy gpt-oss-120b through the SageMaker JumpStart UI

Complete the following steps to deploy gpt-oss-120b through SageMaker JumpStart:

On the SageMaker console, choose Studio in the navigation pane.
First-time users will be prompted to create a domain. If not, choose Open Studio.
On the SageMaker Studio console, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
On the SageMaker JumpStart landing page, search for gpt-oss-120b using the search box.

Choose a model card to view details about the model such as license, data used to train, and how to use the model. Before you deploy the model, review the configuration and model details from the model card. The model details page includes the following information:

The model name and provider information.
A Deploy button to deploy the model.

Choose Deploy to proceed with deployment.

For Endpoint name, enter an endpoint name (up to 50 alphanumeric characters).
For Number of instances, enter a number between 1–100 (default: 1).
For Instance type, select your instance type. For optimal performance with gpt-oss-120b, a GPU-based instance type such as p5.48xlarge is recommended.

Choose Deploy to deploy the model and create an endpoint.

When deployment is complete, your endpoint status will change to InService. At this point, the model is ready to accept inference requests through the endpoint. When the deployment is complete, you can invoke the model using a SageMaker runtime client and integrate it with your applications.

Deploy gpt-oss-120b with the SageMaker Python SDK

To deploy using the SDK, start by selecting the gpt-oss-120b model, specified by the model_id with the value openai-reasoning-gpt-oss-120b. You can deploy your choice of model on SageMaker using the Python SDK examples in the following sections. Similarly, you can deploy gpt-oss-20b using its model ID.

Enable web search on your model with EXA

By default, models in SageMaker JumpStart run in network isolation. The GPT OSS models come with a built-in tool for web search using EXA, a meaning-based web search API powered by embeddings. To use this tool, OpenAI requires customers get an API key from EXA and pass this key as an environment variable to their JumpStartModel instance when deploying it through the SageMaker Python SDK. The following code details how to deploy the model on SageMaker with network isolation disabled and pass in the EXA API key to the model:

from sagemaker.jumpstart.model import JumpStartModel

accept_eula = True
model = JumpStartModel(
    model_id=”openai-reasoning-gpt-oss-120b”,
    enable_network_isolation=False,
    env={
        “EXA_API_KEY”: “”
    }
)
predictor = model.deploy(
    accept_eula=accept_eula
)

You can change these configurations by specifying other non-default values in JumpStartModel. The end user license agreement (EULA) value must be explicitly defined as True to accept the terms. With the preceding deployment, because network isolation is set at deployment time, turning it back on requires creating a new endpoint.

Optionally, you can deploy your model with the JumpStart default values (with network isolation enabled) as follows:

from sagemaker.jumpstart.model import JumpStartModel
accept_eula = True
model = JumpStartModel(model_id=”openai-reasoning-gpt-oss-120b”)
predictor = model.deploy(accept_eula=accept_eula)

Run inference with the SageMaker predictor

After the model is deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    “model”: “/opt/ml/model”,
    “input”: [
        {
            “role”: “system”,
            “content”: “You are a good AI assistant”
        },
        {
            “role”: “user”,
            “content”: “Hello, how is it going?”
        }
    ],
    “max_output_tokens”: 200,
    “stream”: “false”,
    “temperature”: 0.7,
    “top_p”: 1
}
    
response = predictor.predict(payload)
print(response[‘output’][-1][‘content’][0][‘text’])

We get the following response:

Hey there! All good on my end—just ready to dive into whatever you need. How’s it going on your side?

Function calling

The GPT OSS models were trained on the harmony response format for defining conversation structures, generating reasoning output and structuring function calls. The format is designed to mimic the OpenAI Responses API, so if you have used that API before, this format should hopefully feel familiar to you. The model should not be used without using the harmony format. The following example showcases an example of tool use with this format:

payload= {
  “model”: “/opt/ml/model”,
  “input”: “System: You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2024-08-05\n\nreasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.\nCalls to these tools must go to the commentary channel: ‘functions’.\n\n# Tools\n\n## functions\n\nnamespace functions {\n\n// Gets the current weather for a specific location.\ntype get_current_weather = (_: {\n// The city and state/country, e.g. \”San Francisco, CA\” or \”London, UK\”\nlocation: string,\n// Temperature unit preference\nunit?: \”celsius\” | \”fahrenheit\”, // default: celsius\n}) => any;\n\n} // namespace functions\n\nDeveloper: You are a helpful AI assistant. Provide clear, concise, and helpful responses.\n\nHuman: What’s the weather like in Seattle?\n\nAssistant:”,
  “instructions”: “You are a helpful AI assistant. Provide clear, concise, and helpful responses.”,
  “max_output_tokens”: 2048,
  “stream”: “false”,
  “temperature”: 0.7,
  “reasoning”: {
    “effort”: “medium”
  },
  “tools”: [
    {
      “type”: “function”,
      “name”: “get_current_weather”,
      “description”: “Gets the current weather for a specific location”,
      “parameters”: {
        “type”: “object”,
        “properties”: {
          “location”: {
            “type”: “string”,
            “description”: “The city and state/country, e.g. ‘San Francisco, CA’ or ‘London, UK'”
          },
          “unit”: {
            “type”: “string”,
            “enum”: [“celsius”, “fahrenheit”],
            “default”: “celsius”,
            “description”: “Temperature unit preference”
          }
        },
        “required”: [“location”]
      }
    }
  ],
}

We get the following response:

{‘arguments’: ‘{“location”:”Seattle, WA”}’, ‘call_id’: ‘call_596a67599df2465495fd444772ff9539’, ‘name’: ‘get_current_weather’, ‘type’: ‘function_call’, ‘id’: ‘ft_596a67599df2465495fd444772ff9539’, ‘status’: None}

Clean up

After you’re done running the notebook, make sure to delete the resources that you created in the process to avoid additional billing. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we demonstrated how to deploy and get started with OpenAI’s GPT OSS models (gpt-oss-120band gpt-oss-20b) on SageMaker JumpStart. These reasoning models bring advanced capabilities for coding, scientific analysis, and mathematical reasoning tasks directly to your AWS environment with enterprise-grade security and scalability.

Try out the new models, and share your feedback in the comments.

About the Authors

Pradyun Ramadorai, Senior Software Development Engineer
Malav Shastri, Software Development Engineer
Varun Morishetty, Software Development Engineer
Evan Kravitz, Software Development Engineer
Benjamin Crabtree, Software Development Engineer
Shen Teng, Software Development Engineer
Loki Ravi, Senior Software Development Engineer
Nithin Vijeaswaran, Specialist Solutions Architect
Breanne Warner, Enterprise Solutions Architect
Yotam Moss, Software Development Manager
Mike James, Software Development Manager
Sadaf Fardeen, Software Development Manager
Siddharth Shah, Principal Software Development Engineer
June Won, Principal Product Manager



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleUS senators seek probe into data security issues with DeepSeek
Next Article New IBM Decoder Could Unlock Reliable Quantum Computers Sooner Than Expected
Advanced AI Editor
  • Website

Related Posts

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

August 29, 2025

Detect Amazon Bedrock misconfigurations with Datadog Cloud Security

August 29, 2025

Introducing auto scaling on Amazon SageMaker HyperPod

August 29, 2025

Comments are closed.

Latest Posts

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News

Latest Posts

IBM and NASA Trained the First Foundational Model for Heliophysics

August 31, 2025

Kaplan Fox Alerts Investors of C3.ai, Inc. (AI) to a Securities Class Action Deadline on October 21, 2025

August 31, 2025

Baden-Württemberg to be first German state using AI in administration

August 31, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • IBM and NASA Trained the First Foundational Model for Heliophysics
  • Kaplan Fox Alerts Investors of C3.ai, Inc. (AI) to a Securities Class Action Deadline on October 21, 2025
  • Baden-Württemberg to be first German state using AI in administration
  • Mistral AI ‘s New Generative AI Result Of Ex-Meta Platforms, Google Employees Collaboration – Microsoft (NASDAQ:MSFT), Meta Platforms (NASDAQ:META)
  • Nvidia: As Growth Continues to Soar, Should Investors Keep Piling into the Stock?

Recent Comments

  1. mrporngeek.top on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. RichardFUH on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. Juniorfar on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Chi conserva registro delle traduzioni giurate asseverazioni tribunale on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Billyniste on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.