For Ghassemi, it’s just another revelation of the flaws and foibles of medical AI. She’s been at it for years,with dismaying results.
Already, she and her colleagues found that an AI model that produces accurate chest X-ray diagnoses in Canada becomes far less reliable in California, thanks to different lifestyles and risk factors. They’ve discovered that AI chatbots that deliver mental health advice often respond to Black and Asian users with less empathy than white users. And Ghassemi’s latest paper finds that some AIs are predisposed to give bad medical advice to women.
Ghassemi is no Luddite. “I love developing AI systems,” she said. “I’m a professor at MIT for a reason. But it’s clear to me that naive deployments of these systems, that do not recognize the baggage that human data comes with, will lead to harm.”
Ghassemi was born in Iowa to Iranian immigrant parents who relocated to Texas and later New Mexico. “Growing up as a visibly Muslim woman in the US was not easy,” Ghassemi said, ”and I learned from a young age the importance of standing firm when people scream, throw things, and otherwise threaten you for your identity.”
While earning computer science and engineering degrees from New Mexico State University, the University of Oxford, and MIT, Ghassemi developed a deep concern about the potential hazards of the systems she designed. “As technologists,” she said, “we have a responsibility to improve society with our tools rather than allowing them free rein.”
Among other things, that means making medical AI products that won’t make dangerous mistakes because of a few misspelled words.
In a paper published by the Association of Computing Machinery in June, a team led by Ghassemi assembled hundreds of medical records, along with the clinical advice offered by human physicians who reviewed the files.
The researchers next added oddities to the files that might occur if a non-English speaker or someone with a limited education exchanged emails with a doctor. They introduced spelling errors, extra white spaces, and inexact turns of phrase. They even added colorful language reflecting patient anxiety — phrases like “I thought I was going to die.”
Then the records were fed to four AI systems that decided whether the patient needed an in-person doctor visit, some lab tests, or no treatment at all. The results were compared to the decisions of the human doctors. Ghassemi found that the presence of faulty content made it 7 to 9 percent more likely that the AI would recommend that patients did not need additional treatment.
It might sound like a relatively small error. But each case represents a human being who might not get the care they need.
Worse yet, Ghassemi and her MIT colleagues found that AI systems were more likely to leave female patients untreated. The researchers tried removing explicit references to the patients’ gender. But the AI systems still correctly identified some of them as women, and these patients were more likely than males to be told they didn’t need a doctor visit or a medical test.
“The model reduced care more in the patients it thought were female,” said Ghassemi.
Paul Hager, a researcher at the Institute for AI and Informatics in Medicine at the Technical University of Munich, said that Ghassemi’s work supports his own findings that AIs can be easily misled.
“Adding additional information, even if true and relevant, often reduced the accuracy of models,” said Hager, who was not involved with Ghassemi’s research. “This is a complex issue that I think is being addressed to some degree through more advanced reasoning models … but there is little research on how to solve it on a more fundamental level.”
Ghassemi can only theorize about how AIs can pinpoint a patient’s gender. She believes the systems figure out subtle clues in the medical records — word choices or the specific questions the patients ask. “It is kind of amazing,” she said. “It’s a little scary.”
It’s not a new issue for her. Ghassemi was part of a team that discovered in 2022 that an AI could detect a person’s race merely by looking at an X-ray — something no human physician can do. (Ghassemi thinks the melanin in Black skin must leave traces on an X-ray that a human eye can’t detect, but an AI can.)
But for eliminating AI biases toward people of color and women, Ghassemi’s got a roadmap. “You have to train on diverse, representative data sets,” she said. That means lots of Black people, Asians, and other ethnic groups and plenty of females, but also people of diverse economic status and educational background. Ghassemi also wants regular audits of these systems to make sure they remain fair even as they’re trained up on fresh data. The clinicians who rely on the systems should be prepared to overrule them if it’s in the best interests of the patient.
And such standards should be embedded in law, she said. “We need regulation that makes equity a mandatory performance standard for clinical AI.”
Ghassemi says the rise of AI diagnostics is a rare opportunity to root out race and gender bias from all kinds of medical systems, and not just the digital ones.
“People have known that the health care system is killing more women and minorities for a very long time. But we haven’t been able to make them angry about it,” she said. “We might be able to make them angry enough about AI doing it that we can fix the underlying system.”
Hiawatha Bray can be reached at hiawatha.bray@globe.com. Follow him @GlobeTechLab.