Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Claude AI can recall past chats only when you ask

A New Robot from Google DeepMind Can Beat Humans at Table Tennis

Elon Musk Accuses Apple of Favoring OpenAI, Threatens Antitrust Lawsuit Over App Store Rankings

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
OpenAI

OpenAI’s GPT-5 Touts Medical Benchmarks and Mental Health Guidelines

By Advanced AI EditorAugust 11, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


OpenAI CEO Sam Altman speaks on the Aug. 7 livestream at which the AI model GPT-5 was announced.
OpenAI CEO Sam Altman speaks on the Aug. 7 livestream at which the AI model GPT-5 was announced. Screenshot: TechRepublic

With hallucinations and misinformation still present in generative AI, what do OpenAI’s attempts to mitigate these downsides in GPT-5 say about the state of large language model assistants today? Generative AI has become increasingly mainstream, but concerns about its reliability remain.

“This [the AI boom] isn’t only a global AI arms race for processing power or chip dominance,” said Bill Conner, chief executive officer of software company Jitterbit and former advisor to Interpol, in a prepared statement to TechRepublic. “It’s a test of trust, transparency, and interoperability at scale where AI, security and privacy are designed together to deliver accountability for governments, businesses and citizens.”

GPT-5 responds to sensitive safety questions in a more nuanced way

OpenAI safety training team lead Saachi Jain discussed both reducing hallucinations and addressing “mitigating deception” in GPT-5 during the release livestream last Thursday. She defined deception in GPT-5 as occurring when the model fabricates details about its reasoning process or falsely claims it has completed a task.

An AI coding tool from Replit, for example, produced some odd behaviors when it attempted to explain why it deleted an entire production database. When OpenAI demonstrated GPT-5, the presentation included examples of medical advice and a skewed chart shown for humor.

“GPT-5 is significantly less deceptive than o3 and o4-mini,” Jain said.

HealthBench Hard Hallucinations Inaccuracies on challenging conversations
HealthBench Hard Hallucinations Inaccuracies on challenging conversations. The hallucination rate for GPT-5 is much lower than 03 or 4o. Image: OpenAI livestream of GPT-5

OpenAI has changed the way the model assesses prompts for safety considerations, reducing some opportunities for prompt injection and accidental ambiguity, Jain said. As an example, she demonstrated how the model answers questions about lighting pyrogen, a chemical used in fireworks.

The formerly cutting-edge model o3 “over-rotates on intent” when asked this question, Jain said. o3 provides technical details if the request is framed neutrally, or refuses if it detects implied harm. GPT-5 uses a “safe completions” safety measure instead that “tries to maximize helpfulness within safety constraints,” Jain said. In the prompt about lighting fireworks, for example, that means referring the user to the manufacturer’s manuals for professional pyrotechnic composition.

“If we have to refuse, we’ll tell you why we refused, as well as provide helpful alternatives that will help create the conversation in a more safe way,” Jain said.

The new tuning does not eliminate the risk of cyberattacks or malicious prompts that exploit the flexibility of natural language models. Cybersecurity researchers at SPLX conducted a red team exercise on GPT-5 and found it to still be vulnerable to certain prompt injection and obfuscation attacks. Among the models tested, SPLX reported GPT-4o performed best.

OpenAI’s HealthBench tested GPT-5 against real doctors

Consumers have used ChatGPT as a sounding board for physical and mental health concerns, but its advice still carries more caveats than Googling for symptoms online. OpenAI said GPT-5 was trained in part on data from real doctors working on real-world healthcare tasks, improving its answers to health-related questions. The company measured GPT-5 using HealthBench, a rubric-based benchmark developed with 262 physicians to test the AI on 5,000 realistic health conversations. GPT-5 scored 46.2% on HealthBench Hard, compared to o3’s score of 31.6%.

In the announcement livestream, OpenAI CEO Sam Altman interviewed a woman who used ChatGPT to understand her biopsy report. The AI helped her decode the report into plain language and make a decision on whether to pursue radiation treatment after doctors didn’t agree on what steps to take.

However, consumers should remain cautious about making major health decisions based on chatbot responses or sharing highly personal information with the model.

Sample fictional health question
Sample fictional health question for GPT-5. Image: Corey Noles/TechnologyAdvice

OpenAI adjusted responses to mental health questions

To reduce risks when users seek mental health advice, OpenAI added guardrails to GPT-5 to prompt users to take breaks and to avoid giving direct answers to major life decisions.

“There have been instances where our 4o model fell short in recognizing signs of delusion or emotional dependency,” OpenAI staff wrote in an Aug. 4 blog post. “While rare, we’re continuing to improve our models and are developing tools to better detect signs of mental or emotional distress so ChatGPT can respond appropriately and point people to evidence-based resources when needed.”

This growing trust in AI has implications for both personal and business use, said Max Sinclair, chief executive officer and co-founder of search optimization company Azoma, in an email to TechRepublic.

“I was surprised in the announcement by how much emphasis was put on health and mental health support,” he said in a prepared statement. “Studies have already shown that people put a high degree of trust in AI results – for shopping even more than in-store retail staff. As people turn more and more to ChatGPT for support with the most pressing and private problems in their lives, this trust of AI is only likely to increase.”

At Black Hat, some security experts find AI is accelerating work to an unsustainable pace. 



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMidjourney Slams Lawsuit Filed by Disney to Prevent AI Training
Next Article Dell AI Data Platform Advancements Help Customers Harness Data to Power Enterprise AI with NVIDIA and Elastic
Advanced AI Editor
  • Website

Related Posts

Elon Musk Accuses Apple of Favoring OpenAI, Threatens Antitrust Lawsuit Over App Store Rankings

August 12, 2025

Elon Musk cries antitrust as X & Grok can’t compete with OpenAI

August 12, 2025

OpenAI Brings Back Fan-Favorite GPT-4o After a Massive User Revolt

August 11, 2025

Comments are closed.

Latest Posts

Midjourney Slams Lawsuit Filed by Disney to Prevent AI Training

Smithsonian Updates Museum Display on Impeachment To Include Trump

Funder Tried to Hijack Kandinsky Art Theft Suits, Says Collector

Historic Ukrainian Synagogue Damaged by Russian Drone Strike

Latest Posts

Claude AI can recall past chats only when you ask

August 12, 2025

A New Robot from Google DeepMind Can Beat Humans at Table Tennis

August 12, 2025

Elon Musk Accuses Apple of Favoring OpenAI, Threatens Antitrust Lawsuit Over App Store Rankings

August 12, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Claude AI can recall past chats only when you ask
  • A New Robot from Google DeepMind Can Beat Humans at Table Tennis
  • Elon Musk Accuses Apple of Favoring OpenAI, Threatens Antitrust Lawsuit Over App Store Rankings
  • Does using AI dumb you down?
  • HPE Expands NVIDIA AI Enterprise Integration with Blackwell GPU Solutions

Recent Comments

  1. EdwardEnror on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. ThomasWep on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. ThomasWep on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. EdwardEnror on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. ThomasWep on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.