Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

Foundation AI: Cisco launches AI model for integration in security applications

I tested ChatGPT’s Deep Research against Gemini, Perplexity, and Grok AI to see which is best

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning
VentureBeat AI

You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning

Advanced AI BotBy Advanced AI BotMay 9, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

OpenAI today announced on its developer-focused account on the social network X that third-party software developers outside the company can now access reinforcement fine-tuning (RFT) for its new o4-mini language reasoning model, enabling them to customize a new, private version of it based on their enterprise’s unique products, internal terminology, goals, employees, processes, and more.

Essentially, this capability lets developers take the model available to the general public and tweak it to better fit their needs using OpenAI’s platform dashboard.

Then, they can deploy it through OpenAI’s application programming interface (API), another part of its developer platform, and connect it to their internal employee computers, databases, and applications.

Once deployed, if an employee or leader at the company wants to use it through a custom internal chatbot or custom OpenAI GPT to pull up private, proprietary company knowledge; or to answer specific questions about company products and policies; or generate new communications and collateral in the company’s voice, they can do so more easily with their RFT version of the model.

However, one cautionary note: research has shown that fine-tuned models may be more prone to jailbreaks and hallucinations, so proceed cautiously!

This launch expands the company’s model optimization tools beyond supervised fine-tuning (SFT) and introduces more flexible control for complex, domain-specific tasks.

Additionally, OpenAI announced that supervised fine-tuning is now supported for its GPT-4.1 nano model, the company’s most affordable and fastest offering to date.

How does Reinforcement Fine-Tuning (RFT) help organizations and enterprises?

RFT creates a new version of OpenAI’s o4-mini reasoning model that is automatically adapted to the user’s goals, or those of their enterprise/organization.

It does so by applying a feedback loop during training, which developers at large enterprises (or even independent developers working on their own) can now initiate relatively simply, easily, and affordably through OpenAI’s online developer platform.

Instead of training on a set of questions with fixed correct answers — which is what traditional supervised learning does — RFT uses a grader model to score multiple candidate responses per prompt.

The training algorithm then adjusts model weights so that high-scoring outputs become more likely.

This structure allows customers to align models with nuanced objectives such as an enterprise’s “house style” of communication and terminology, safety rules, factual accuracy, or internal policy compliance.

To perform RFT, users need to:

Define a grading function or use OpenAI model-based graders.

Upload a dataset with prompts and validation splits.

Configure a training job via API or the fine-tuning dashboard.

Monitor progress, review checkpoints, and iterate on data or grading logic.

RFT currently supports only o-series reasoning models and is available for the o4-mini model.

Early enterprise use cases

On its platform, OpenAI highlighted several early customers who have adopted RFT across diverse industries:

Accordance AI used RFT to fine-tune a model for complex tax analysis tasks, achieving a 39% improvement in accuracy and outperforming all leading models on tax reasoning benchmarks.

Ambience Healthcare applied RFT to ICD-10 medical code assignment, raising model performance by 12 points over physician baselines on a gold-panel dataset.

Harvey used RFT for legal document analysis, improving citation extraction F1 scores by 20% and matching GPT-4o in accuracy while achieving faster inference.

Runloop fine-tuned models for generating Stripe API code snippets, using syntax-aware graders and AST validation logic, achieving a 12% improvement.

Milo applied RFT to scheduling tasks, boosting correctness in high-complexity situations by 25 points.

SafetyKit used RFT to enforce nuanced content moderation policies and increased model F1 from 86% to 90% in production.

ChipStack, Thomson Reuters, and other partners also demonstrated performance gains in structured data generation, legal comparison tasks, and verification workflows.

These cases often shared characteristics: clear task definitions, structured output formats, and reliable evaluation criteria—all essential for effective reinforcement fine-tuning.

RFT is available now to verified organizations. OpenAI is offering a 50% discount to teams that choose to share their training datasets with OpenAI to help improve future models. Interested developers can get started using OpenAI’s RFT documentation and dashboard.

Pricing and billing structure

Unlike supervised or preference fine-tuning, which is billed per token, RFT is billed based on time spent actively training. Specifically:

$100 per hour of core training time (wall-clock time during model rollouts, grading, updates, and validation).

Time is prorated by the second, rounded to two decimal places (so 1.8 hours of training would cost the customer $180).

Charges apply only to work that modifies the model. Queues, safety checks, and idle setup phases are not billed.

If the user employs OpenAI models as graders (e.g., GPT-4.1), the inference tokens consumed during grading are billed separately at OpenAI’s standard API rates. Otherwise, the company can use outside models, including open source ones, as graders.

Here is an example cost breakdown:

ScenarioBillable TimeCost4 hours training4 hours$4001.75 hours (prorated)1.75 hours$1752 hours training + 1 hour lost (due to failure)2 hours$200

This pricing model provides transparency and rewards efficient job design. To control costs, OpenAI encourages teams to:

Use lightweight or efficient graders where possible.

Avoid overly frequent validation unless necessary.

Start with smaller datasets or shorter runs to calibrate expectations.

Monitor training with API or dashboard tools and pause as needed.

OpenAI uses a billing method called “captured forward progress,” meaning users are only billed for model training steps that were successfully completed and retained.

So should your organization invest in RFTing a custom version of OpenAI’s o4-mini or not?

Reinforcement fine-tuning introduces a more expressive and controllable method for adapting language models to real-world use cases.

With support for structured outputs, code-based and model-based graders, and full API control, RFT enables a new level of customization in model deployment. OpenAI’s rollout emphasizes thoughtful task design and robust evaluation as keys to success.

Developers interested in exploring this method can access documentation and examples via OpenAI’s fine-tuning dashboard.

For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models with operational or compliance goals — without building RL infrastructure from scratch.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleChatGPT’s deep research tool gets a GitHub connector to answer questions about code
Next Article AI Startups Dominate Global VC Funding in Q1 2025: Pitchbook
Advanced AI Bot
  • Website

Related Posts

The walled garden cracks: Nadella bets Microsoft’s Copilots—and Azure’s next act—on A2A/MCP interoperability

May 8, 2025

Alibaba’s ‘ZeroSearch’ lets AI learn to google itself — slashing training costs by 88 percent

May 8, 2025

OpenAI names Instacart leader Fidji Simo as new CEO of Applications

May 8, 2025
Leave A Reply Cancel Reply

Latest Posts

AI Artist Answers Life’s Surreal Questions By Phone

Warhol and Helen Frankenthaler Foundation Announce $800,000 Fund

Arts Directors Exit the National Endowment for the Arts

Beyond ‘Love,’ The Enduring Legacy Of Robert Indiana Resonates Deeply Through Pace Gallery Representation

Latest Posts

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

May 9, 2025

Foundation AI: Cisco launches AI model for integration in security applications

May 9, 2025

I tested ChatGPT’s Deep Research against Gemini, Perplexity, and Grok AI to see which is best

May 9, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.