Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

Anthropic on using Claude user data for training AI: Privacy policy explained

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

By Advanced AI EditorAugust 29, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

A new training framework developed by researchers at Tencent AI Lab and Washington University in St. Louis enables large language models (LLMs) to improve themselves without requiring any human-labeled data. The technique, called R-Zero, uses reinforcement learning to generate its own training data from scratch, addressing one of the main bottlenecks in creating self-evolving AI systems. R-Zero works by having two independent models co-evolve by interacting with and challenging each other.

Experiments show that R-Zero substantially improves reasoning capabilities across different LLMs, which could lower the complexity and costs of training advanced AI. For enterprises, this approach could accelerate the development of specialized models for complex reasoning tasks without the massive expense of curating labeled datasets.

The challenge of self-evolving LLMs

The idea behind self-evolving LLMs is to create AI systems that can autonomously generate, refine, and learn from their own experiences. This offers a scalable path toward more intelligent and capable AI. However, a major challenge is that training these models requires large volumes of high-quality tasks and labels, which act as supervision signals for the AI to learn from.

Relying on human annotators to create this data is not only costly and slow but also creates a fundamental bottleneck. It effectively limits an AI’s potential capabilities to what humans can teach it. To address this, researchers have developed label-free methods that derive reward signals directly from a model’s own outputs, for example, by measuring its confidence in an answer. While these methods eliminate the need for explicit labels, they still rely on a pre-existing set of tasks, thereby limiting their applicability in truly self-evolving scenarios.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Other approaches involve having models generate their own tasks to learn from. However, in domains like open-ended reasoning, where there is no simple way to check for correctness (such as a code executor), ensuring the quality of this self-generated data is a significant hurdle.

How R-Zero works

R-Zero is a framework designed to train reasoning LLMs that can evolve from zero external data. The process begins with a single base model, which is split into two roles: a “Challenger” and a “Solver.” These two models are optimized independently but evolve together through a continuous cycle of interaction.

The Challenger’s goal is to create new tasks that are just at the threshold of the Solver’s current abilities, neither too easy nor impossible. The Solver, in turn, is rewarded for solving these increasingly complex tasks. In written comments to VentureBeat, Chengsong Huang, co-author of the paper and a doctoral student at Washington University in St. Louis, explained that this dynamic is crucial because generating high-quality questions is often more complicated than finding the answers.

“What we found in a practical setting is that the biggest challenge is not generating the answers… but rather generating high-quality, novel, and progressively more difficult questions,” Huang said. “We believe that good teachers are far rarer than good students. The co-evolutionary dynamic automates the creation of this ‘teacher,’ ensuring a steady and dynamic curriculum that pushes the Solver’s capabilities far beyond what a static, pre-existing dataset could achieve.”

Once the Challenger generates enough questions, they are filtered for diversity and compiled into a training dataset. In the Solver’s training phase, it is fine-tuned on these challenging questions. The “correct” answer for each question is determined by a majority vote from the Solver’s own previous attempts. 

This entire process repeats, creating a self-improving loop that operates without any human intervention, allowing the two models to push each other to become progressively more capable across each iteration.

R-Zero in action

The researchers tested R-Zero on several open-source LLMs, including models from the Qwen3 and OctoThinker families. They first trained the models on math problems and then tested whether the learned reasoning skills could generalize to other complex, general-domain benchmarks like MMLU-Pro (multi-language understanding and reasoning tasks) and SuperGPQA (science and reasoning tasks).

The results showed that R-Zero is a highly effective, model-agnostic framework. For instance, it boosted the Qwen3-4B-Base model’s score by +6.49 on average across math reasoning benchmarks. The training process consistently and substantially improved performance, with gains accumulating over several iterations. The larger Qwen3-8B-Base model saw its average math score climb by +5.51 points after three iterations.

A key finding was the immediate performance leap after the first iteration, which validated the effectiveness of the Challenger’s role in creating a high-quality learning curriculum. “This confirms that the intelligent curriculum generated by the RL-trained Challenger is significantly more effective than that of a non-trained generator,” the researchers write in their paper.

Notably, the skills learned from math problems were effectively transferred to general reasoning tasks, thereby enhancing the models’ underlying capabilities. For example, the same Qwen3-4B-Base model showed an improvement of +7.54 on general-domain reasoning benchmarks. Another interesting finding is that R-Zero can serve as a decisive pre-training step. Models first improved by R-Zero achieved even higher performance when later fine-tuned on traditional labeled data, suggesting the framework acts as a performance amplifier.

For enterprises, the “from zero data” approach could be a game-changer, especially in niche domains where high-quality data is scarce or non-existent. Huang highlights that R-Zero’s main advantage is its ability to sidestep the most expensive and time-consuming part of AI development: data curation.

“Our approach entirely bypasses the fundamental bottleneck of having to find, label, and curate high-quality datasets,” he said. “This is not just about a cost-saving measure; it’s a pathway toward creating AI that can surpass human capabilities, because it is no longer limited by the scope of human knowledge or data.”

However, the co-evolutionary process also revealed a critical challenge. As the Challenger successfully generates progressively more difficult problems, the Solver’s ability to produce reliable “correct” answers via majority vote begins to decline. The researchers found that the true accuracy of these self-generated labels dropped from 79% in the first iteration to 63% by the third, compared to a strong oracle LLM such as GPT -4. This decline in data quality is a key trade-off and a potential bottleneck for the system’s long-term performance.

Huang acknowledged that this is a fundamental problem for the self-evolving paradigm. “Our work is a proof of concept that demonstrates the potential of this approach, but we acknowledge that maintaining stable, long-term improvement without plateauing is a significant hurdle,” he said. “Solving this problem will be a crucial next step for the entire research community.”

The researchers also highlight a key limitation of the framework: the current mechanism is best suited for domains like math where correctness can be objectively determined. So, how could this powerful paradigm be extended to more subjective enterprise tasks like generating marketing copy or summarizing reports?

Huang suggests a potential path forward involves adding a third, co-evolving AI agent to the mix: a “Verifier” or “Critic.”

“Instead of evaluating for a simple ‘correct’ answer, this Verifier would be trained to evaluate the quality of the Solver’s output based on more nuanced criteria,” he explained. “The co-evolutionary dynamic would then involve the Challenger creating the prompt, the Solver generating the response, and the Verifier providing a quality signal, with all three models improving together.”

While this remains a direction for future research, it points toward a future where fully autonomous AI systems can master not just objective logic, but subjective reasoning as well.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleVocal Image is using AI to help people communicate better
Next Article MIT startup Commonwealth Fusion Systems raises $863 million
Advanced AI Editor
  • Website

Related Posts

Nvidia’s strong Q2 results can’t mask the ASIC challenge in their future

August 29, 2025

Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

August 29, 2025

In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption

August 29, 2025

Comments are closed.

Latest Posts

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Sotheby’s to Launch First Series of Luxury Auctions in Abu Dhabi

Nazi-Looted Painting Turns Up in Argentinean Real Estate Listing

Latest Posts

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

August 29, 2025

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

August 29, 2025

Anthropic on using Claude user data for training AI: Privacy policy explained

August 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • ROSE: Remove Objects with Side Effects in Videos – Takara TLDR
  • DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei
  • Anthropic on using Claude user data for training AI: Privacy policy explained
  • MIT startup Commonwealth Fusion Systems raises $863 million
  • Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

Recent Comments

  1. QilyrZobre on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. satewdiG on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. https://sites.google.com/view/pbnbuildingservice/home on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. JoshuaBib on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. MichaelStevy on C3 AI and Arcfield Announce Partnership to Accelerate AI Capabilities to Serve U.S. Defense and Intelligence Communities

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.