Skip to content
The Future

ChatGPT gives better, more empathetic responses to patient questions than doctors

A panel of healthcare professionals much preferred responses that came from the chatbot in a recent study.
a robot hand holding a magnifying glass.
Credit: Andrey Popov / Adobe Stock
Key Takeaways
  • In a recent study, scientists explored how well ChatGPT can answer health-related questions from real patients.
  • ChatGPT outperformed human doctors, especially on metrics related to empathy, according to a panel of medical experts.
  • As healthcare increasingly goes virtual, expect artificial intelligence to play an increasingly large role in patient care.

If the results of a recent study are any indication, ChatGPT may soon be professionally responding to patients’ written health-related questions.

Virtual healthcare drastically expanded during the COVID-19 pandemic, and use has remained elevated ever since. At most healthcare providers, patients can now message their physicians through an online portal. In many of these exchanges, patients pose serious medical questions that require time and knowledge to answer. Since answering these questions adds to the workload of physicians, some healthcare providers have started charging for these question-and-answer services.

To ease the strain on doctors and lower costs for patients, artificial intelligence could play a role here. The meteoric rise of OpenAI’s ChatGPT since its public launch last November inspired Dr. John Ayers, vice chief of innovation in the UC-San Diego School of Medicine Division of Infectious Disease and Global Public Health, to see if the chatbot could accurately and empathetically respond to real health-related questions.

As Ayers and his co-authors described in a paper published in April in JAMA Internal Medicine, they randomly selected 195 question-and-answer exchanges posted last October on Reddit’s /r/AskDocs, a community with 481,000 members where vetted doctors respond to publicly asked medical questions. The researchers then posed the same queries to ChatGPT (version 3.5) in December. Afterward, they showed the exchanges to a group of healthcare professionals to evaluate the physicians’ and AI’s answers to the questions. Responses were graded for both quality and empathy on 5-point scales.

In order to ensure that the raters didn’t know if the answers were written by ChatGPT or a doctor, “Responses were randomly ordered, stripped of revealing information, and labeled response 1 or response 2 to blind evaluators to the identity of the author,” the researchers detailed.

When the ratings were tabulated, the results vastly favored ChatGPT over human doctors. The evaluators preferred the AI response 78.6% of the time, giving ChatGPT’s answers an average quality rating of 4.1 out of 5, corresponding to “good.” Physicians’ answers were given an average rating of 3.26, corresponding to “acceptable.”

A better bedside manner?

Moreover, the chatbot soundly beat the humans on scores of empathy. Evaluators rated ChatGPT’s answers as “empathetic” or “very empathetic” 9.8 times more often than they did doctors’ responses.

Here are snippets of a doctor’s response and ChatGPT’s response to a patient’s question about getting bleach in their eye.

The doctor:

“Sounds like you will be fine. You should flush the eye anytime you get a chemical or foreign body in the eye. You can also contact Poison Control 1-800-222-1222.”

And ChatGPT:

“I’m sorry to hear that you got bleach splashed in your eye. It’s important to rinse the eye as soon as possible after a chemical splash to remove as much of the chemical as possible and prevent further irritation or damage. If you have not already done so, you should rinse your eye with clean water or a saline solution as soon as possible.”

Smarter faster: the Big Think newsletter
Subscribe for counterintuitive, surprising, and impactful stories delivered to your inbox every Thursday

A significant limitation of the study was that doctors’ responses were drawn from a casual, anonymous online forum and not a professional setting, so their answers to questions may not have been as thoughtfully worded or as meticulous as they could have been. This shows in the average response length: ChatGPT’s responses averaged 211 words, while doctors’ averaged 52.

This clear drawback prevents ChatGPT from conclusively claiming victory over trained doctors. However, the study clearly suggests that ChatGPT could satisfactorily assist medical professionals in responding to patient queries.

“Chatbots could assist clinicians when messaging with patients, by drafting a message based on a patient’s query for physicians or support staff to edit,” the authors wrote. “Such an AI-assisted approach could unlock untapped productivity so that clinical staff can use the time-savings for more complex tasks.”

First author Dr. John Ayers thinks this will just be the beginning. “The opportunities for improving healthcare with AI are massive,” he said in a statement. “AI-augmented care is the future of medicine.”


Related

Up Next