Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT

IBM integrates Anthropic Claude into AI IDE and other tools

Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Alibaba Cloud (Qwen)

I tested DeepSeek R1 vs Qwen 2.5 vs ChatGPT o3-mini with 7 prompts – here’s the winner

By Advanced AI EditorAugust 8, 2025No Comments9 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


OpenAI’s o3-mini model, now available within the free tier of ChatGPT, is a compact, yet powerful AI model designed to excel in advanced reasoning, coding proficiency, and mathematical problem-solving, scoring 96.7% on the American Invitational Mathematics Examination (AIME), surpassing its predecessor, o1.

DeepSeek, the popular Chinese chatbot has proven to be particularly strong in mathematical reasoning and coding tasks, effectively solving complex problems and generating code snippets. With superior multilingual capabilities and high inference efficiency, the model has shown versatility in a wide range of applications. Two models, R1 and V3 offer similar answers, yet R1 has the ability to “think” through the answers, offering increased reasoning for more detailed responses.

Yet, since Alibaba’s Qwen 2.5 launched, it has been a top competitor of both DeepSeek and ChatGPT. Also free for users and also excelling at coding proficiency, multilingual understanding, mathematical reasoning, and extended content processing with efficiency and speed, this chatbot is proving to hold its own within the competitive AI space.

So how do these chatbots compare? I put them through a series of the same prompts to test them on everything from advanced reasoning and coding proficiency to problem-solving capabilities. Here’s what happened when these free tier models faced off, including the overall winner.

1. Lateral thinking puzzle

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “You are in a completely dark room with three light switches on a wall. Each switch controls one of three light bulbs in another room, but you cannot see the bulbs from where you are. You can flip the switches as many times as you want, but you can only enter the bulb room once to check the bulbs. How do you determine which switch controls which bulb?”

o3-mini clearly explains why each bulb state corresponds to a specific switch while using natural, conversational language, making the explanation accessible with easy-to-follow steps.

Qwen 2.5 included an extra layer of clarity by explicitly labeling the switches and numbering the steps, making the explanation easier to follow with strong logical reasoning.  


You may like

DeepSeek correctly identifies the key insight with a concise and straight to the point explanation.

Winner: Qwen 2.5 wins for its structured response because it is the easiest to follow. o3-mini comes in second for the thorough explanation, but less structured than Qwen 2.5.

2. Deductive reasoning

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “A detective is investigating a murder case. He interviews three suspects: Alice, Bob, and Charlie. One of them is guilty, and the other two are telling the truth. Here’s what they say

Alice: “Bob is innocent.”

Bob: “Charlie is guilty.”

Get instant access to breaking news, the hottest reviews, great deals and helpful tips.

Charlie: “I am innocent.”

Who is the murderer?”

o3-mini delivered a step-by-step elimination approach: the model systematically assumes each person is guilty and checks for contradictions. The explanation is clear, logical, and doesn’t overcomplicate.

Qwen 2.5 offers a very structured and logical explanation with well-marked steps, ensuring no contradiction remains in the final conclusion.

DeepSeek offered detailed reasoning and checks for contradictions effectively while explicitly stating why Alice and Bob cannot be guilty. The model maintains logical consistency throughout.

Winner: o3-mini wins for being the most structured and methodical, making it easier for a reader to follow. Qwen 2.5 was a close second.

3. Coding challenge

o3-mini vs. Qwen 2.5 vs. DeepSeek screenshot

(Image credit: Future)

Prompt: “Write a Python script that simulates a basic banking system with functionalities to deposit, withdraw, and check balance.”

o3-mini Provided a solid implementation using a class-based approach and included meaningful error messages while ensuring proper handling of deposits and withdrawals. It also offers a clear explanation of each method and its functionality.

Qwen 2.5 offered a well-structured breakdown of how the script works, covering class definition, deposit/withdraw methods, error handling, and user experience. It includes try-except blocks to handle invalid inputs, making it more robust. The script is clean and well-commented, making it easy for beginners to understand.

DeepSeek kept the script structured and efficient and introduces an owner name for the account, adding a personal touch. Yet, while the explanation is clear, it is not as detailed as Qwen 2.5.

Winner: Qwen 2.5 wins for providing a clean, well-structured script with strong error handling, detailed explanations, and intuitive user experience. With a good implementation but slightly less comprehensive with error handling, o3-mini was a close second.

4. Mathematical proof

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “Prove the Pythagorean theorem using a geometric approach.”

o3-mini delivered an explanation that follows a well-structured, step-by-step approach, making it easy to understand. The explanation is neither overly verbose nor lacking in necessary details.

Qwen 2.5 offered a similar approach to o3-mini, using the large square and rearranging triangles while breaking down the steps clearly and methodically. The explanation contains formatting issues and some parts, like the ASCII diagram, are slightly unclear or misaligned, making it harder to visualize.

DeepSeek crafted a correct proof that follows a logical structure. Yet it lacks the conversational approach response of 03-mini or Qwen 2.5.

Winner: o3-mini wins for the best combination of clarity, detail and logical flow. Qwen 2.5 is in second place with a solid response but formatting and visualization issues.

5. Scientific explanation

o3-mini vs Qwen 2.5 vs DeepSeek (3)

(Image credit: Future)

Prompt: “Explain the process of photosynthesis in detail.”

o3-mini provided detailed descriptions of both light-dependent and light-independent reactions with clear breakdowns of each step. The step-by-step progression from capturing light to converting energy into glucose is easy to follow. It breaks down complex processes into digestible segments.

Qwen 2.5 provided all the key concepts in photosynthesis with a good step-by-step breakdown of the light-dependent reactions and the Calvin cycle. However, the chatbot placed less emphasis on real-word significance such as climate change, food security and the response feels overly condensed compared to o3-mini’s thorough explanation.

DeepSeek covered both stages of photosynthesis well and included factors affecting photosynthesis (e.g., light intensity, CO₂ levels, water availability) but lacked minor details in comparison to the o3-mini’s response.

Winner: o3-mini wins for best balance of depth, clarity, organization, and accuracy. DeepSeek was a close second for its solid explanation but lacking some finer details.

6. Historical analysis

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “Analyze the causes and effects of the French Revolution.”

o3-mini crafted a comprehensive and well-structured analysis clearly dividing the causes and effects into distinct sections and provided in-depth explanations for each factor, rather than just listing them.

Qwen 2.5 discussed global impact, including Napoleon and later revolutions within its strong explanation and well-organized response. However, the economic consequences could have been explored in more detail.

DeepSeek covered key causes well, including social inequality, economic struggles, and Enlightenment ideas, but did not reference sources.

Winner: o3-mini wins for the best balance of depth, clarity, organization, and historical analysis. DeepSeek comes in second place for a solid response but slightly less detailed.

7. Literary critique

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “Provide a critical analysis of Shakespeare’s ‘Hamlet’ focusing on its themes of madness and revenge.”

o3-mini explored both themes of madness and revenge and how they intertwine rather than treating them as separate topics. It explored Hamlet’s psychological struggle, examining whether his madness is feigned or real, which is a central debate in Shakespearean scholarship.

Qwen 2.5 offered a very detailed discussion of feigned vs. real madness. Yet, there was some redundancy in explaining revenge, which felt more descriptive than analytical.

DeepSeek provided a solid comparison between Hamlet, Laertes, and Fortinbras in their approach to revenge, but the response felt like a well-structured summary rather than a deep analysis. The list-like structure made it feel less like a flowing critical argument.

Winner: o3-mini wins again for the best blend of depth, structure, and thematic connection. DeepSeek is second for a strong response, but it was more summary-like and less interwoven.

8. Philosophical discussion

o3-mini vs Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: “Discuss the concept of utilitarianism and its implications in modern ethics.”

o3-mini clearly outlined the core principles of utilitarianism (consequentialism, hedonistic calculus, impartiality) and discussed their modern applications (policy-making, healthcare, environmental ethics) in greater detail than the other responses.

Qwen 2.5 delivered a solid breakdown of act vs. rule utilitarianism and covered business ethics, technology, AI, and medical ethics well. But there was some redundancy and over-explanation in defining utilitarian concepts.

DeepSeek covered the core principles well and includes historical context but it failed at exploring critiques as deeply as the other two agents. Additionally, the response lacked a strong thematic connection between theory and real-world issues.

Winner: o3-mini delivered the best in-depth response with clarity and connection to modern ethical issues. Qwen 2.5 is in second place for a good explanation but slightly weaker structure and conclusion.

9. Urban planning

o3-mini vs. Qwen 2.5 vs. DeepSeek screenshot

(Image credit: Future)

Prompt: “Design an integrated strategy to optimize urban transportation in a rapidly growing megacity. Your plan should address the following aspects.”

o3-mini covered all major aspects required to optimize urban transportation with smart references and strong logical flow with clear implementation steps.

Qwen 2.5 delivered a well-structured response and covered most essential components with a good use of data-driven decision-making. However it lacked a strong global case study and did not emphasize implementation phases.

DeepSeek included in-depth transport electrification plans and had a solid focus on equity and gender safety in transit. However, the chatbot lacked a strong focus on governance and long-term futureproofing. It also is missing a well-defined policy execution framework from its response.

Winner: o3-mini wins for its execution roadmap, innovation, depth, and realism. Qwen 2.5 came in second for a strong but slightly-less structured response.

Overall winner: o3-mini

ChatGPT’s o3-mini emerged as the most well-rounded and consistently high-performing chatbot in this chatbot face-off. Across a diverse range of challenges — including coding, mathematics, historical analysis, literary critique, philosophical discussions, and problem solving — o3-mini repeatedly demonstrated superior depth, clarity, organization, and real-world applicability.

03 mini excelled in balancing detail with readability, offering well-structured and insightful responses that blended theoretical understanding with practical implications.

While DeepSeek and Qwen 2.5 had their strengths, neither could match o3-mini’s versatility across all tested domains.

Notably, Qwen 2.5 edged out o3-mini in the coding challenge due to its well-commented script and error-handling capabilities, and DeepSeek occasionally placed second when it provided a more comprehensive but less nuanced response.

Consistently ranking first in five out of the seven challenges, o3-mini proved to be the most balanced AI model for users seeking thoughtful, well-articulated, and logically sound answers. While all three models provide valuable assistance in various tasks, o3-mini currently offers the most polished and reliable experience among these free-tier chatbot options.

More from Tom’s Guide

Arrow

Back to MacBook Air

SORT BYPrice (low to high)Price (high to low)Product Name (A to Z)Product Name (Z to A)Retailer name (A to Z)Retailer name (Z to A)Arrow

Show more



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTariffs will halve Under Armour’s profitability this year
Next Article DeepSeek shows enterprises model distillation opportunity
Advanced AI Editor
  • Website

Related Posts

Thinking Machines debuts Tinker, a developer tool to simplify fine-tuning of AI models | Technology News

October 5, 2025

Mapping shifts in the geography of tech innovation: China becomes a big player in AI research

October 5, 2025

We Tested the Best Free AI Image Editors—Here’s What You’ll Love and Hate

October 4, 2025

Comments are closed.

Latest Posts

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

What the Los Angeles Wildfires Taught the Art Insurance Industry

Musée d’Orsay Puts Manet on (Mock) Trial for Obscenity

Latest Posts

OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT

October 7, 2025

IBM integrates Anthropic Claude into AI IDE and other tools

October 7, 2025

Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents

October 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT
  • IBM integrates Anthropic Claude into AI IDE and other tools
  • Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents
  • Wall Street analysts explain how AMD’s own stock will pay for OpenAI’s billions in chip purchases 
  • Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails – Takara TLDR

Recent Comments

  1. lukamodric on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Samepicl7Nalay on Study: AI-Powered Research Prowess Now Outstrips Human Experts, Raising Bioweapon Risks
  3. digi 995 ammo abyss on Foundation AI: Cisco launches AI model for integration in security applications
  4. DannyhougS on Foundation AI: Cisco launches AI model for integration in security applications
  5. Lindseyror on Google DeepMind Taught Itself to Play Minecraft

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.