A new study investigates whether GPT-4 can provide more in-depth analysis than medical professionals.
Although it’s only a few months old, Open AI’s GPT-4 chatbot already hints at tantalizing possibilities for improving efficiency in health care delivery. Results of a recent study suggest one such possibility: using the technology to help respond to patients’ medical questions.
The study was conducted using 195 randomly selected medical questions posted to Reddit r/AskDocs, an online social media forum where users can post medical questions and verified health care professionals submit answers. The authors entered the questions into the GPT-4 chatbot, then had a group of health care professionals compare the answers the chatbot generated with those provided on the r/AskDocs forum.
Evaluators were asked to choose which response they thought was better based on two categories: “the quality of information provided” and “the empathy or bedside manner provided.” For the former they could choose from responses that included ‘very poor,’ ‘poor,’ ‘acceptable’, ‘good,’ and ‘very good’.
Responses for the latter were ‘not empathetic, ‘slightly empathetic,’ ‘moderately empathetic,’ ‘empathetic, and ‘very empathetic.’ The researchers then ordered mean outcomes on a 1 to 5 scale and compared those the chatbot to those of the physicians.
The results showed the evaluators preferring the chatbot over the physician responses in 78.6% of their overall evaluations. Broken down by category, the chatbot responses received an average rating of 4.13 for quality—between ‘good’ and ‘very good’--compared to 3.26, or ‘acceptable’ for physicians. For the empathy category, chatbot responses received an average rating of 3.65, or ‘empathetic,’ while those of physicians were rated 2.15, or ‘slightly empathetic.’ The proportion of chatbot responses rated ‘empathetic’ or ‘very empathetic’ was 45%, compared to just 4.6% for physicians.
The authors say the study’s outcome should serve as a catalyst for research into adapting AI for messaging purposes by, for example, using the technology to draft responses to patient questions that the physician or a staff member could then edit. This approach could produce time savings that clinical staff could use for more complex tasks.
In addition, they say, AI messaging could have beneficial effects on the use of clinical resources. “If more patients questions are answered quickly, with empathy, and to a high standard, it might reduce unnecessary clinical visits, freeing up resources for those who need them.”
The study, “Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum” was published online April 28 in JAMA Internal Medicine.