Paper Page - Clinical Knowledge In LLMs Does Not Translate To Human Interactions

Global healthcare providers are exploring use of large language models (LLMs)
to provide medical advice to the public. LLMs now achieve nearly perfect scores
on medical licensing exams, but this does not necessarily translate to accurate
performance in real-world settings. We tested if LLMs can assist members of the
public in identifying underlying conditions and choosing a course of action
(disposition) in ten medical scenarios in a controlled study with 1,298
participants. Participants were randomly assigned to receive assistance from an
LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control). Tested
alone, LLMs complete the scenarios accurately, correctly identifying conditions
in 94.9% of cases and disposition in 56.3% on average. However, participants
using the same LLMs identified relevant conditions in less than 34.5% of cases
and disposition in less than 44.2%, both no better than the control group. We
identify user interactions as a challenge to the deployment of LLMs for medical
advice. Standard benchmarks for medical knowledge and simulated patient
interactions do not predict the failures we find with human participants.
Moving forward, we recommend systematic human user testing to evaluate
interactive capabilities prior to public deployments in healthcare.

Source link

What's Hot

C3.ai and DigitalOcean Shares Skyrocket, What You Need To Know

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

Transforming the physical world with AI: the next frontier in intelligent automation

Paper page – Clinical knowledge in LLMs does not translate to human interactions

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

Mitigating Overthinking through Reasoning Shaping – Takara TLDR

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Artist Behind Canterbury Cathedral Art Responds to JD Vance, Elon Musk

Jenkins Johnson Gallery to Open Tribeca Outpost on Marian Goodman Gallery’s Third Floor

Ruth Asawa May Have Broken Record at MoMA—and More Art News

C3.ai and DigitalOcean Shares Skyrocket, What You Need To Know

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

Transforming the physical world with AI: the next frontier in intelligent automation

What's Hot

Paper page – Clinical knowledge in LLMs does not translate to human interactions

Related Posts

Subscribe to Updates