Google is giving its diagnostic AI the ability to understand visual medical information with its latest research on AMIE (Articulate Medical Intelligence Explorer).
Imagine chatting with an AI about a health concern, and instead of just processing your words, it could actually look at the photo of that worrying rash or make sense of your ECG printout. That’s what Google is aiming for.
We already knew AMIE showed promise in text-based medical chats, thanks to earlier work published in Nature. But let’s face it, real medicine isn’t just about words.
Doctors rely heavily on what they can see – skin conditions, readings from machines, lab reports. As the Google team rightly points out, even simple instant messaging platforms “allow static multimodal information (e.g., images and documents) to enrich discussions.”
Text-only AI was missing a huge piece of the puzzle. The big question, as the researchers put it, was “Whether LLMs can conduct diagnostic clinical conversations that incorporate this more complex type of information.”
Google teaches AMIE to look and reason
Google’s engineers have beefed up AMIE using their Gemini 2.0 Flash model as the brains of the operation. They’ve combined this with what they call a “state-aware reasoning framework.” In plain English, this means the AI doesn’t just follow a script; it adapts its conversation based on what it’s learned so far and what it still needs to figure out.
It’s close to how a human clinician works: gathering clues, forming ideas about what might be wrong, and then asking for more specific information – including visual evidence – to narrow things down.
“This enables AMIE to request relevant multimodal artifacts when needed, interpret their findings accurately, integrate this information seamlessly into the ongoing dialogue, and use it to refine diagnoses,” Google explains.
Think of the conversation flowing through stages: first gathering the patient’s history, then moving towards diagnosis and management suggestions, and finally follow-up. The AI constantly assesses its own understanding, asking for that skin photo or lab result if it senses a gap in its knowledge.
To get this right without endless trial-and-error on real people, Google built a detailed simulation lab.
Google created lifelike patient cases, pulling realistic medical images and data from sources like the PTB-XL ECG database and the SCIN dermatology image set, adding plausible backstories using Gemini. Then, they let AMIE ‘chat’ with simulated patients within this setup and automatically check how well it performed on things like diagnostic accuracy and avoiding errors (or ‘hallucinations’).
The virtual OSCE: Google puts AMIE through its paces
The real test came in a setup designed to mirror how medical students are assessed: the Objective Structured Clinical Examination (OSCE).
Google ran a remote study involving 105 different medical scenarios. Real actors, trained to portray patients consistently, interacted either with the new multimodal AMIE or with actual human primary care physicians (PCPs). These chats happened through an interface where the ‘patient’ could upload images, just like you might in a modern messaging app.
Afterwards, specialist doctors (in dermatology, cardiology, and internal medicine) and the patient actors themselves reviewed the conversations.
The human doctors scored everything from how well history was taken, the accuracy of the diagnosis, the quality of the suggested management plan, right down to communication skills and empathy—and, of course, how well the AI interpreted the visual information.
Surprising results from the simulated clinic
Here’s where it gets really interesting. In this head-to-head comparison within the controlled study environment, Google found AMIE didn’t just hold its own—it often came out ahead.
The AI was rated as being better than the human PCPs at interpreting the multimodal data shared during the chats. It also scored higher on diagnostic accuracy, producing differential diagnosis lists (the ranked list of possible conditions) that specialists deemed more accurate and complete based on the case details.
Specialist doctors reviewing the transcripts tended to rate AMIE’s performance higher across most areas. They particularly noted “the quality of image interpretation and reasoning,” the thoroughness of its diagnostic workup, the soundness of its management plans, and its ability to flag when a situation needed urgent attention.
Perhaps one of the most surprising findings came from the patient actors: they often found the AI to be more empathetic and trustworthy than the human doctors in these text-based interactions.
And, on a critical safety note, the study found no statistically significant difference between how often AMIE made errors based on the images (hallucinated findings) compared to the human physicians.
Technology never stands still, so Google also ran some early tests swapping out the Gemini 2.0 Flash model for the newer Gemini 2.5 Flash.
Using their simulation framework, the results hinted at further gains, particularly in getting the diagnosis right (Top-3 Accuracy) and suggesting appropriate management plans.
While promising, the team is quick to add a dose of realism: these are just automated results, and “rigorous assessment through expert physician review is essential to confirm these performance benefits.”
Important reality checks
Google is commendably upfront about the limitations here. “This study explores a research-only system in an OSCE-style evaluation using patient actors, which substantially under-represents the complexity… of real-world care,” they state clearly.
Simulated scenarios, however well-designed, aren’t the same as dealing with the unique complexities of real patients in a busy clinic. They also stress that the chat interface doesn’t capture the richness of a real video or in-person consultation.
So, what’s the next step? Moving carefully towards the real world. Google is already partnering with Beth Israel Deaconess Medical Center for a research study to see how AMIE performs in actual clinical settings with patient consent.
The researchers also acknowledge the need to eventually move beyond text and static images towards handling real-time video and audio—the kind of interaction common in telehealth today.
Giving AI the ability to ‘see’ and interpret the kind of visual evidence doctors use every day offers a glimpse of how AI might one day assist clinicians and patients. However, the path from these promising findings to a safe and reliable tool for everyday healthcare is still a long one that requires careful navigation.
(Photo by Alexander Sinn)
See also: Are AI chatbots really changing the world of work?

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.