Can AI do better than a human clinician?

Various research groups have been teasing the idea of an AI doctor for the better half of the past decade. In late December, computer scientists from Google and DeepMind put forth their version of an AI clinician that can diagnose a patient’s medical conditions based on their symptoms, using a large language model called PaLM.

Per a preprint paper published by the group, their model scored 67.6 percent on a benchmark test containing questions from the US Medical License Exam, which they claim surpassed previous state-of-the-art software by 17 percent. One version of it performed at a similar level to human clinicians. But, there are plenty of caveats that come with this algorithm, and others like it.

Here are some quick facts about the model: It was trained on a dataset of over 3,000 commonly searched medical questions, and six other existing open datasets for medical questions and answers, including medical exams and medical research literature. In their testing phase, the researchers compared the answers from two versions of the AI to a human clinician, and evaluated these responses for accuracy, factuality, relevance, helpfulness, consistency with current scientific consensus, safety, and bias.

Adriana Porter Felt, a software engineer that works on Google Chrome who was not a part of the paper, noted on Twitter that the version of the model that answered medical questions similarly to human clinicians accounts for the added feature of “instruction prompt tuning, which is a human process that is laborious and does not scale.” This includes carefully tweaking the wording of the question in a specific way that allows the AI to retrieve the correct information.

The researchers even wrote in the paper that their model “performs encouragingly, but remains inferior to clinicians,” and that the model’s “comprehension [of medical context], recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning.” For example, every version of the AI missed important information and included incorrect or inappropriate content in their answers at a higher rate compared to humans.

Language models are getting better at parsing information with more complexity and volume. And they seem to do okay with tasks that require scientific knowledge and reasoning. Several small models, including SciBERT and PubMedBERT, have pushed the boundaries of language models to understand texts loaded with jargon and specialty terms.

But in the biomedical and scientific fields, there are complicated factors at play and many unknowns. And if the AI is wrong, then who takes responsibility for malpractice? Can the source of the error be traced back to a source when much of the algorithm works like a black box? Additionally, these algorithms (mathematical instructions given to the computer by programmers) are imperfect and need complete and correct training data, which is not always available for various conditions across different demographics. Plus, buying and organizing health data can be expensive.

Answering questions correctly on a multiple-choice standardized test does not convey intelligence. And the computer’s analytical ability might fall short if it were presented with a real-life clinical case. So while these tests look impressive on paper, most of these AIs are not ready for deployment. Consider IBM’s Watson AI health project. Even with millions of dollars in investment, it still had numerous problems and was not practical or flexible enough at scale (it ultimately imploded and was sold for parts).

Google and DeepMind do recognize the limitations of this technology. They wrote in their paper that there are still several areas that need to be developed and improved for this model to be actually useful, such as the grounding of the responses in authoritative, up-to-date medical sources and the ability to detect and communicate uncertainty effectively to the human clinician or patient.

Win the Holidays with PopSci's Gift Guides

25 enchanting images from the Wildlife Photographer of the Year People’s Choice awards 25 enchanting images from the Wildlife Photographer of the Year People’s Choice awards

Are weight-loss drugs contributing to a fall in the obesity rate? Are weight-loss drugs contributing to a fall in the obesity rate?

This Billie Eilish cover is unlike any other (because it’s made by Google’s AI) This Billie Eilish cover is unlike any other (because it’s made by Google’s AI)

Old text messages are letting people chat with the dead Old text messages are letting people chat with the dead

When it comes to board games, humans don’t stand a chance against AI When it comes to board games, humans don’t stand a chance against AI

Google taught a robot dog new tricks by having it mimic the real thing Google taught a robot dog new tricks by having it mimic the real thing

Artificial intelligence could improve psychiatric care Artificial intelligence could improve psychiatric care

Artificial intelligence is taking an increased role in diagnosing and treating cancer Artificial intelligence is taking an increased role in diagnosing and treating cancer

Can AI escape our control and destroy us? Can AI escape our control and destroy us?

How Google Aims To Dominate Artificial Intelligence How Google Aims To Dominate Artificial Intelligence

Google search will now show you which local doctors accept Medicare Google search will now show you which local doctors accept Medicare

Project Hamilton takes its first run at modeling a digital dollar Project Hamilton takes its first run at modeling a digital dollar

MIT scientists taught robots how to sabotage each other MIT scientists taught robots how to sabotage each other

The most innovative gadgets of 2021 The most innovative gadgets of 2021

The most significant security innovations of 2020 The most significant security innovations of 2020

Ring has big plans for its smart security gadgets and radar is at the core Ring has big plans for its smart security gadgets and radar is at the core

How to delete your old email address without losing everything How to delete your old email address without losing everything

Smartphone security starts with the lock screen. Here’s how to protect it. Smartphone security starts with the lock screen. Here’s how to protect it.

Share

Win the Holidays with PopSci's Gift Guides