Evaluation of the accuracy of ChatGPT-4 and Gemini’s responses to the World Dental Federation’s frequently asked questions on oral health | BMC Oral Health

admin

10 months ago

Evaluation of the accuracy of ChatGPT-4 and Gemini’s responses to the World Dental Federation’s frequently asked questions on oral health | BMC Oral Health

AI-based chat applications are software applications that frequently offer a variety of services through text-based interfaces. These applications can be used for different purposes, such as providing information on specific topics, answering questions, delivering customer service, and even offering therapy by creating an artificial chat environment with people [18]. These applications are utilised in various domains within the healthcare system, as well as in dentistry. Researchers pose a range of questions to the AI and assess its integration into their respective areas of expertise. For instance, ChatGPT-4 was queried on the subject of the dentistry licence examination, and it was observed that it provided accurate responses, indicating its potential for application in patient management and dentistry education [19]. A recent study has shown that one of these applications, ChatGPT, can even be used to write scientific articles. It is progressing toward a digital workflow with significant changes. The emergence of new technologies has allowed patients to be better informed about their treatment plans [20].

The capacity of AI applications such as Chat-GPT to diagnose clinical conditions by generating diagnoses based on existing symptoms has not yet been fully established. While Hirosawa et al. reported that ChatGPT-3.5 was able to generate differential diagnoses with a high degree of diagnostic accuracy for a group of patients with general complaints [21], another study by Strong et al. emphasised that the accuracy of the responses to the case questions posed to ChatGPT varied in each iteration of the study and there were limitations in their consistency [22]. A literature review revealed that no studies have evaluated the quality of responses to questions about oral hygiene provided by AI applications such as ChatGPT-4 and Gemini. The aim of the current study is to evaluate the adequacy and usability of the responses given by the ChatGPT-4 and Gemini applications to frequently asked oral health questions by patients.

The utilization of AI-supported applications that are specifically designed for the management of oral health in patients can facilitate enhanced treatment adherence by enabling patients to access information without the need for direct interaction with a dental practitioner. This can lead to better clinical outcomes while reducing wasted time for both the patient and the dentist [23].

In a comparative study involving the ChatGPT-4 and BING AI chat applications in the domain of ophthalmological triage, ChatGPT-4 exhibited an elevated degree of diagnostic and triage accuracy, with minimal instances of erroneous responses in comparison to the BING application [24]. In this study, the ability of ChatGPT-4 to respond to common patient questions was evaluated by two researchers, and it was found to be more successful than the Gemini application.

A study investigating the potential benefits and limitations of utilizing the ChatGPT-4 in the field of oral and maxillofacial surgery reported that the ChatGPT-4 provides accurate and helpful responses to frequently asked questions posed by patients [24]. Among the applications evaluated in the present study, ChatGPT-4 was found to be more successful than Gemini in providing accurate and sufficient responses to frequently asked questions on the basis of FDI responses.

Muttanahally et al. [25] conducted a study evaluating the efficiency of four AI-supported voice virtual assistants (Google Assistant, Siri, Alexa, Cortana) in writing oral and maxillofacial radiology reports. They reported that Google Assistant was the most efficient, followed by Cortana, Siri, and Alexa. The study concluded that while virtual assistants were useful and practical for answering questions related to oral, dental, and maxillofacial radiology, more specialized topics and information were needed for writing reports [26]. Studies indicate that the use of AI in healthcare is promising. Nevertheless, it should be used responsibly alongside human health professionals, taking into account its limitations, and is expected to become more widespread in the field of dentistry in the future [24,25,26]. Another study reported that ChatGPT has potential applications in dental education and the creation of radiology reports, but it has limitations, such as the inability to respond to image-based questions and verify content [26]. A comparative study evaluating the responses of experts and ChatGPT to endodontic questions revealed that while ChatGPT is not yet sufficient for clinical decision-making, it could be used with further development [27].

In another study, the responses of ChatGPT to 30 questions regarding tooth-supported fixed prostheses and removable dental prostheses were evaluated by a panel of experts. The findings indicated that ChatGPT is not a suitable replacement for a dentist.

The training data of ChatGPT-4 may include questions and responses from the examined FDI website; however, this raises the concern that its responses might merely be rephrased versions of the website’s content. Importantly, ChatGPT lacks independent scientific reasoning; instead, it can only generate responses on the basis of recognized patterns and structures within the texts it has been trained on [28, 29]. However, the authors used a plagiarism detection program to verify the originality of ChatGPT-4’s responses compared with the information on the website, alleviating this concern. Considering the importance of oral and dental health, the high prevalence of oral and dental diseases, differing opinions on dental materials and methods, and technological advancements, dentists’ efforts to maintain oral and dental health are very important [30]. In this context, patients need to be accurately informed. Owing to the impact of the pandemic, many patients have preferred to research various subtopics on web-based platforms instead of visiting clinics where the risk of contamination is high [31]. However, the validity and reliability of information on web-based sites and applications are debatable. A number of studies have assessed the veracity of information on diverse online platforms and have reached the conclusion that the internet, which is akin to an infinite ocean, cannot supplant the patient‒doctor relationship in healthcare [32]. Nevertheless, the considerable increase in the utilization of the internet and AI in the developing world evinces a growing tendency among individuals to seek online resources for health-related matters, as is the case in numerous other domains [33]. Therefore, ensuring that those who require it have access to accurate and reliable information is highly important.

In Bloom’s taxonomy, a question-type classification is based on cognitive and hierarchical criteria, which underlie medical education, and a scale of question types is provided, ranging from lower to higher levels [34]. Lower-level cognitive skills include abilities such as recall and understanding, whereas higher-order thinking skills involve application, analysis, evaluation, and creation. LLM-based applications can respond sufficiently to lower-level skills but are not as successful in higher-level skills [35]. Therefore, the Bloom classification of questions asked for AI applications in studies is important and should be considered a limitation [36]. In the present study, the fact that all four questions obtained from the FDI website had lower-level skills may explain the accuracy of the responses given by ChatGPT-4 and Gemini regarding oral health. In future studies, it will be crucial to evaluate the responses by asking LLM-based applications more complex, higher-level questions. Indeed, the insufficiency of some responses in certain studies also proves this. In the context of dentistry, it is believed that studies on topics that frequently cause concern among patients and result in the dissemination of misinformation within society will contribute to the development of scientific literature and enhance the health and well-being of individuals and communities.

Despite extensive deployment in medical applications, no AI technology has been explicitly developed for dentistry. Some AI systems have demonstrated potential benefits in dentistry. However, the inconsistencies in their ability to provide accurate and sufficient information require further investigation by researchers. A specific AI-supported oral health module or an AI-supported application prepared specifically for the field of periodontology is needed. Implementing this technology could increase the efficiency of oral health practices, just as in their medical counterparts.

The present study evaluated the efficacy of two artificial intelligence programmes in informing the public about oral health in comparison with the FDI website. It was observed that although the answers on the FDI website were concise and adequate, the answers given by the AI programmes were more detailed and provided clues on options such as lifestyle and alternative medicine. However, as it was not possible to evaluate these criteria, all three sources were not evaluated in this respect.

Furthermore, an analysis of the plagiarism rates in the answers provided by the AI programmes revealed that the FDI website or another academic database was not among the sources used by these programmes. While the responses provided by AI programs are accurate and adequate, their reliance on non-scientific databases as a source of information is a cause for concern.

In the future, similar studies should be conducted with larger samples, multiple users, and a broader range of questions. AI-supported applications are promising in medicine and dentistry for answering patients’ treatment-related questions accurately and sufficiently. However, the number of topics and information must be increased for these applications to be used in the field of oral health.

link