Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution – Takara TLDR

Training-Free Group Relative Policy Optimization – Takara TLDR

Singapore company allegedly helped China smuggle $2 billion worth of Nvidia AI processors, report claims — Nvidia denies that the accused has any China ties, but a U.S. investigation is underway

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

30 seconds vs. 3: The d1 reasoning framework that’s slashing AI response times

By Advanced AI EditorApril 29, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Researchers from UCLA and Meta AI have introduced d1, a novel framework using reinforcement learning (RL) to significantly enhance the reasoning capabilities of diffusion-based large language models (dLLMs). While most attention has focused on autoregressive models like GPT, dLLMs offer unique advantages. Giving them strong reasoning skills could unlock new efficiencies and applications for enterprises.

dLLMs represent a distinct approach to generating text compared to standard autoregressive models, potentially offering benefits in terms of efficiency and information processing, which could be valuable for various real-world applications.

Understanding diffusion language models

Most large language models (LLMs) like GPT-4o and Llama are autoregressive (AR). They generate text sequentially, predicting the next token based only on the tokens that came before it. 

Diffusion language models (dLLMs) work differently. Diffusion models were initially used in image generation models like DALL-E 2, Midjourney and Stable Diffusion. The core idea involves gradually adding noise to an image until it’s pure static, and then training a model to meticulously reverse this process, starting from noise and progressively refining it into a coherent picture.

Adapting this concept directly to language was tricky because text is made of discrete units (tokens), unlike the continuous pixel values in images. Researchers overcame this by developing masked diffusion language models. Instead of adding continuous noise, these models work by randomly masking out tokens in a sequence and training the model to predict the original tokens.

This leads to a different generation process compared to autoregressive models. dLLMs start with a heavily masked version of the input text and gradually “unmask” or refine it over several steps until the final, coherent output emerges. This “coarse-to-fine” generation enables dLLMs to consider the entire context simultaneously at each step, as opposed to focusing solely on the next token.

This difference gives dLLMs potential advantages, such as improved parallel processing during generation, which could lead to faster inference, especially for longer sequences. Examples of this model type include the open-source LLaDA and the closed-source Mercury model from Inception Labs. 

“While autoregressive LLMs can use reasoning to enhance quality, this improvement comes at a severe compute cost with frontier reasoning LLMs incurring 30+ seconds in latency to generate a single response,” Aditya Grover, assistant professor of computer science at UCLA and co-author of the d1 paper, told VentureBeat. “In contrast, one of the key benefits of dLLMs is their computational efficiency. For example, frontier dLLMs like Mercury can outperform the best speed-optimized autoregressive LLMs from frontier labs by 10x in user throughputs.”

Reinforcement learning for dLLMs

Despite their advantages, dLLMs still lag behind autoregressive models in reasoning abilities. Reinforcement learning has become crucial for teaching LLMs complex reasoning skills. By training models based on reward signals (essentially rewarding them for correct reasoning steps or final answers) RL has pushed LLMs toward better instruction-following and reasoning. 

Algorithms such as Proximal Policy Optimization (PPO) and the more recent Group Relative Policy Optimization (GRPO) have been central to applying RL effectively to autoregressive models. These methods typically rely on calculating the probability (or log probability) of the generated text sequence under the model’s current policy to guide the learning process.

This calculation is straightforward for autoregressive models due to their sequential, token-by-token generation. However, for dLLMs, with their iterative, non-sequential generation process, directly computing this sequence probability is difficult and computationally expensive. This has been a major roadblock to applying established RL techniques to improve dLLM reasoning.

The d1 framework tackles this challenge with a two-stage post-training process designed specifically for masked dLLMs:

Supervised fine-tuning (SFT): First, the pre-trained dLLM is fine-tuned on a dataset of high-quality reasoning examples. The paper uses the “s1k” dataset, which contains detailed step-by-step solutions to problems, including examples of self-correction and backtracking when errors occur. This stage aims to instill foundational reasoning patterns and behaviors into the model.

Reinforcement learning with diffu-GRPO: After SFT, the model undergoes RL training using a novel algorithm called diffu-GRPO. This algorithm adapts the principles of GRPO to dLLMs. It introduces an efficient method for estimating log probabilities while avoiding the costly computations previously required. It also incorporates a clever technique called “random prompt masking.”

During RL training, parts of the input prompt are randomly masked in each update step. This acts as a form of regularization and data augmentation, allowing the model to learn more effectively from each batch of data.

d1 in real-world applications

The researchers applied the d1 framework to LLaDA-8B-Instruct, an open-source dLLM. They fine-tuned it using the s1k reasoning dataset for the SFT stage. They then compared several versions: the base LLaDA model, LLaDA with only SFT, LLaDA with only diffu-GRPO and the full d1-LLaDA (SFT followed by diffu-GRPO).

These models were tested on mathematical reasoning benchmarks (GSM8K, MATH500) and logical reasoning tasks (4×4 Sudoku, Countdown number game).

The results showed that the full d1-LLaDA consistently achieved the best performance across all tasks. Impressively, diffu-GRPO applied alone also significantly outperformed SFT alone and the base model. 

“Reasoning-enhanced dLLMs like d1 can fuel many different kinds of agents for enterprise workloads,” Grover said. “These include coding agents for instantaneous software engineering, as well as ultra-fast deep research for real-time strategy and consulting… With d1 agents, everyday digital workflows can become automated and accelerated at the same time.”

Interestingly, the researchers observed qualitative improvements, especially when generating longer responses. The models began to exhibit “aha moments,” demonstrating self-correction and backtracking behaviors learned from the examples in the s1k dataset. This suggests the model isn’t just memorizing answers but learning more robust problem-solving strategies.

Autoregressive models have a first-mover advantage in terms of adoption. However, Grover believes that advances in dLLMs can change the dynamics of the playing field. For an enterprise, one way to decide between the two is if their application is currently bottlenecked by latency or cost constraints.

According to Grover, reasoning-enhanced diffusion dLLMs such as d1 can help in one of two complementary ways: 

If an enterprise is currently unable to migrate to a reasoning model based on an autoregressive LLM, reasoning-enhanced dLLMs offer a plug-and-play alternative that allows enterprises to experience the superior quality of reasoning models at the same speed as non-reasoning, autoregressive dLLM. 

If the enterprise application allows for a larger latency and cost budget, d1 can generate longer reasoning traces using the same budget and further improve quality. 

“In other words, d1-style dLLMs can Pareto-dominate autoregressive LLMs on the axis of quality, speed, and cost,” Grover said.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI is fixing a ‘bug’ that allowed minors to generate erotic conversations
Next Article Top 12 countries investing the most in AI
Advanced AI Editor
  • Website

Related Posts

Is vibe coding ruining a generation of engineers?

October 12, 2025

Will updating your AI agents help or hamper their performance? Raindrop's new tool Experiments tells you

October 11, 2025

When dirt meets data: ScottsMiracle-Gro saved $150M using AI

October 11, 2025
Leave A Reply

Latest Posts

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

Latest Posts

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution – Takara TLDR

October 12, 2025

Training-Free Group Relative Policy Optimization – Takara TLDR

October 12, 2025

Singapore company allegedly helped China smuggle $2 billion worth of Nvidia AI processors, report claims — Nvidia denies that the accused has any China ties, but a U.S. investigation is underway

October 12, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution – Takara TLDR
  • Training-Free Group Relative Policy Optimization – Takara TLDR
  • Singapore company allegedly helped China smuggle $2 billion worth of Nvidia AI processors, report claims — Nvidia denies that the accused has any China ties, but a U.S. investigation is underway
  • Memory Retrieval and Consolidation in Large Language Models through Function Tokens – Takara TLDR
  • When You Tell AI Models to Act Like Women, Most Become More Risk-Averse: Study

Recent Comments

  1. online wettbüro on Tesla threatened in France with claims of ‘deceptive’ practices
  2. Gaylord Hertzler on Class Dismissed? Representative Claims in Getty v. Stability AI | Cooley LLP
  3. Wettstrategien Sportwetten on AI Competitive Self-Play | Two Minute Papers #205
  4. Wettbüro berlin on Annoyed ChatGPT users complain about bot’s relentlessly positive tone
  5. Kala on A Bitter AI Lesson – Compute Reigns Supreme!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.