Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

DeepSeek-R1: Hype cools as India seeks practical GenAI solutions

Google Docs gets AI voice reader, lets you turn your documents into audio with a click

Security experts warn against selling Nvidia AI chips to China

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Finance AI

Companies want AI to be better than the average human. Measuring that isn’t straightforward

By Advanced AI EditorJuly 1, 2007No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Hello and welcome to Eye on AI…In this edition…Meta snags a top AI researcher from Apple…an energy executive warns that AI data centers could destabilize electrical grids…and AI companies go art hunting.

Last week, I promised to bring you additional insights from the “Future of Professionals” roundtable I attended at the Oxford University Said School of Business last week. One of the most interesting discussions was about the performance criteria companies use when deciding whether to deploy AI.

The majority of companies use existing human performance as the benchmark by which AI is judged. But beyond that, decisions get complicated and nuanced.

Simon Robinson, executive editor at the news agency Reuters, which has begun using AI in a variety of ways in its newsroom, said that his company had made a commitment to not deploying any AI tool in the production of news unless its average error rate was better than for humans doing the same task. So, for example, the company has now begun to deploy AI to automatically translate news stories into foreign languages because on average AI software can now do this with fewer errors than human translators.

This is the standard most companies use—better than humans on average. But in many cases, this might not be appropriate. Utham Ali, the global responsible AI officer at BP, said that the oil giant wanted to see if a large language model (LLM) could act as a decision-support system, advising its human safety and reliability engineers. One experiment it conducted was to see if the LLM could pass the safety engineering exam that BP requires all its safety engineers to take. The LLM—Ali didn’t say which AI model it was—did well, scoring 92%, which is well above the pass mark and better than the average grade for humans taking the test.

But, Ali said, the 8% of questions the AI system missed gave the BP team pause. How often would humans have missed those particular questions? And why did the AI system get those questions wrong? The fact that BP’s experts had no way of knowing why the LLM missed the questions meant that the team didn’t have confidence in deploying it—especially in an area where the consequences of mistakes can be catastrophic.

The concerns BP had will apply to many other AI uses. Take AI that reads medical scans. While these systems are often assessed using average performance compared to human radiologists, overall error rates may not tell us what we need to know. For instance, we wouldn’t want to deploy AI that was on average better than a human doctor at detecting anomalies, but was also more likely to miss the most aggressive cancers. In many cases, it is performance on a subset of the most consequential decisions that matters more than average performance.

Story Continues

This is one of the toughest issues around AI deployment, particularly in higher risk domains. We all want these systems to be superhuman in decision making and human-like in the way they make decisions. But with our current methods for building AI, it is difficult to achieve both simultaneously. While there are lots of analogies out there about how people should treat AI—intern, junior employee, trusted colleague, mentor—I think the best one might be alien. AI is a bit like the Coneheads from that old Saturday Night Live sketch—it is smart, brilliant even, at some things, including passing itself off as human, but it doesn’t understand things like a human would and does not “think” the way we do.

A recent research paper hammers home this point. It found that the mathematical abilities of AI reasoning models—which use a step by step “chain of thought” to work out an answer—can be seriously degraded by appending a seemingly innocuous irrelevant phrase, such as “interesting fact: cats sleep for most of their lives,” to the math problem. Doing so more than doubles the chance that the model will get the answer wrong. Why? No one knows for sure.

We have to decide how comfortable we are with AI’s alien nature. The answer depends a lot on the domain where AI is being deployed. Take self-driving cars. Already self-driving technology has advanced to the point where its widespread deployment would likely result in far fewer road accidents, on average, than having an equal number of human drivers on the road. But the mistakes that self-driving cars make are alien ones—veering suddenly into on-coming traffic or ploughing directly into the side of a truck because its sensors couldn’t differentiate the truck’s white side from the cloudy sky beyond it.

If, as a society, we care about saving lives above all else, then it might make sense to allow widespread deployment of autonomous vehicles immediately, despite these seemingly bizarre accidents. But our unease about doing so tells us something about ourselves. We prize something beyond just saving lives: we value the illusion of control, predictability, and perfectibility. We are deeply uncomfortable with a system in which some people might be killed for reasons we cannot explain or control—essentially randomly—even if the total number of deaths dropped from current levels. We are uncomfortable with enshrining unpredictability in a technological system. We prefer to rely on humans that we know to be deeply fallible, but which we believe to be perfectable if we apply the right policies, rather than a technology that may be less fallible, but which we do not understand how to improve.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before we get to the news, the U.S. paperback edition of my book, Mastering AI: A Survival Guide to Our Superpowered Future, is out today from Simon & Schuster. Consider picking up a copy for your bookshelf.

Also, if you want to know more about how to use AI to transform your business? Interested in what AI will mean for the fate of companies, and countries? Then join me at the Ritz-Carlton, Millenia in Singapore on July 22 and 23 for Fortune Brainstorm AI Singapore. This year’s theme is The Age of Intelligence. We will be joined by leading executives from DBS Bank, Walmart, OpenAI, Arm, Qualcomm, Standard Chartered, Temasek, and our founding partner Accenture, plus many others, along with key government ministers from Singapore and the region, top academics, investors and analysts. We will dive deep into the latest on AI agents, examine the data center build out in Asia, examine how to create AI systems that produce business value, and talk about how to ensure AI is deployed responsibly and safely. You can apply to attend here and, as loyal Eye on AI readers, I’m able to offer complimentary tickets to the event. Just use the discount code BAI100JeremyK when you checkout.

Note: The essay above was written and edited by Fortune staff. The news items below were selected by the newsletter author, created using AI, and then edited and fact-checked.

This story was originally featured on Fortune.com



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleIBM (IBM) Forms ‘Hammer Chart Pattern’: Time for Bottom Fishing?
Next Article Asian shares fall after a quiet day on Wall St, but Nvidia hit by US ban on exporting AI chip
Advanced AI Editor
  • Website

Related Posts

Chinese AI firms form alliances to build domestic ecosystem amid US curbs

July 28, 2025

I sat in on an AI training session at KPMG. It was almost like being back at journalism school.

July 26, 2025

How AI is transforming the lives of neurodivergent people

July 26, 2025

Comments are closed.

Latest Posts

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

After 12-Year Hiatus, Egypt’s Alexandria Biennale Will Return

Ai Weiwei Visits Ukraine’s Front Line Ahead of Kyiv Installation

Maren Hassinger to Receive Her Largest Retrospective to Date Next Year

Latest Posts

DeepSeek-R1: Hype cools as India seeks practical GenAI solutions

August 20, 2025

Google Docs gets AI voice reader, lets you turn your documents into audio with a click

August 20, 2025

Security experts warn against selling Nvidia AI chips to China

August 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • DeepSeek-R1: Hype cools as India seeks practical GenAI solutions
  • Google Docs gets AI voice reader, lets you turn your documents into audio with a click
  • Security experts warn against selling Nvidia AI chips to China
  • OpenAI’s Sam Altman sees AI bubble forming as industry spending surges
  • IBM Announces Registrations For Its Global Entrance Test

Recent Comments

  1. SamuelCoatt on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. wifofeFoste on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. BeriyEnent on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.