ChatGPT’s performance in dentistry and allergyimmunology assessments: a comparative study

Authors

  • Alexander Fuchs Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
  • Tina Trachsel Division of Allergy, University Children's Hospital Basel, Basel, Switzerland
  • Roland Weiger Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
  • Florin Eggmann Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland

DOI:

https://doi.org/10.61872/sdj-2024-06-01

PMID:

38726506

Keywords:

Allergology, Artificial intelligence, Dental education, Clinical immunology, Machine learning, Medical informatics applications

Abstract

Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT’s performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern’s Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (α=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT’s varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs’ inherent limitations and risks.

References

ALI K, BARHOM N, TAMIMI F, DUGGAL M: ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students. Eur J Dent Educ (2023). doi: 10.1111/eje.12937. DOI: https://doi.org/10.1111/eje.12937

BEAM A L, DRAZEN J M, KOHANE I S, LEONG T-Y, MANRAI A K, RUBIN E J: Artificial intelligence in medicine. N Engl J Med 388: 1220–1221 (2023) DOI: https://doi.org/10.1056/NEJMe2206291

BORNSTEIN M M: Artificial intelligence and personalised dental medicine - just a hype or true game changers? Br Dent J 234: 755 (2023) DOI: https://doi.org/10.1038/s41415-023-5815-8

DASHTI M, LONDONO J, GHASEMI S, MOGHADDASI N: How much can we rely on artificial intelligence chatbots such as the ChatGPT software program to assist with scientific writing? J Prosthet Dent (2023). doi: 10.1016/j.prosdent.2023.05.023. DOI: https://doi.org/10.1016/j.prosdent.2023.05.023

DUCRET M, MÖRCH C-M, KARTEVA T, FISHER J, SCHWENDICKE F: Artificial intelligence for sustainable oral healthcare. J Dent 127: 104344 (2022) DOI: https://doi.org/10.1016/j.jdent.2022.104344

EGGMANN F, WEIGER R, ZITZMANN N U, BLATZ M B: Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent (2023). doi: 10.1111/jerd.13046. DOI: https://doi.org/10.1111/jerd.13046

FUCHS A, TRACHSEL T, WEIGER R, EGGMANN F: ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study (Version 1) [Data set]. Zenodo (2023). https://doi.org/10.5281/zenodo.8331147 DOI: https://doi.org/10.61872/sdj-2024-06-01

HAUG C J, DRAZEN J M: Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med 388: 1201–1208 (2023) DOI: https://doi.org/10.1056/NEJMra2302038

KUNG T H, CHEATHAM M, MEDENILLA A, SILLOS C, DE LEON L, ELEPAÑO C, MADRIAGA M, AGGABAO R, DIAZCANDIDO G, MANINGO J, TSENG V: Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2: e0000198 (2023). doi: 10.1371/journal.pdig.0000198. DOI: https://doi.org/10.1371/journal.pdig.0000198

LEVIN G, HORESH N, BREZINOV Y, MEYER R: Performance of ChatGPT in medical examinations: a systematic review and a meta-analysis. BJOG (2023). doi: 10.1111/1471-0528.17641. DOI: https://doi.org/10.1111/1471-0528.17641

LEWANDOWSKI M, ŁUKOWICZ P, ŚWIETLIK D, BARAŃSKA-RYBAK W: An original study of ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the dermatology specialty certificate examinations. Clin Exp Dermatol (2023). doi: 10.1093/ced/llad255. DOI: https://doi.org/10.1093/ced/llad255

LIM Z W, PUSHPANATHAN K, YEW S M E, LAI Y, SUN C-H, LAM J S H, CHEN D Z, GOH J H L, TAN M C J, SHENG B, CHENG C-Y, KOH V T C, THAM Y-C: Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 95: 104770 (2023) DOI: https://doi.org/10.1016/j.ebiom.2023.104770

MARCUS G: Deep learning is hitting a wall. Accessed on October 3, 2023. https://nautil.us/deep-learning-ishitting-a-wall-238440/. (2022)

MELLO M M, GUHA N: ChatGPT and physicians' malpractice risk. JAMA Health Forum 4: e231938 (2023) DOI: https://doi.org/10.1001/jamahealthforum.2023.1938

PATCAS R, BORNSTEIN M M, SCHÄTZLE M A, TIMOFTE R: Artificial intelligence in medico-dental diagnostics of the face: a narrative review of opportunities and challenges. Clin Oral Investig 26: 6871–6879 (2022) DOI: https://doi.org/10.1007/s00784-022-04724-2

RAFFEL C, SHAZEER N, ROBERTS A, LEE K, NARANG S, MATENA M, ZHOU Y, LI W, LIU P J: Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21: 5485–5551 (2020)

SAAD A, IYENGAR K P, KURISUNKAL V, BOTCHU R: Assessing ChatGPT's ability to pass the FRCS orthopaedic part A exam: a critical analysis. Surgeon (2023). doi: 10.1016/j.surge.2023.07.001. DOI: https://doi.org/10.1016/j.surge.2023.07.001

SCHWENDICKE F, CEJUDO GRANO DE ORO, J, GARCIA CANTU A, MEYER-LUECKEL H, CHAURASIA A, KROIS J: Artificial intelligence for caries detection: value of data and information. J Dent Res 101: 1350–1356 (2022) DOI: https://doi.org/10.1177/00220345221113756

THIRUNAVUKARASU A J: Large language models will not replace healthcare professionals: curbing popular fears and hype. J R Soc Med 116: 181–182 (2023) DOI: https://doi.org/10.1177/01410768231173123

VAIRA L A, LECHIEN J R, ABBATE V, ALLEVI F, AUDINO G, BELTRAMINI G A, BERGONZANI M, BOLZONI A, COMMITTERI U, CRIMI S, GABRIELE G, LONARDI F, MAGLITTO F, PETROCELLI M, PUCCI R, SAPONARO G, TEL A, VELLONE V, CHIESA-ESTOMBA C M, BOSCOLO-RIZZO P, SALZANO G, DE RIU G: Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial urgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg (2023). doi: 10.1002/ohn.489. DOI: https://doi.org/10.1002/ohn.489

VASWANI A, SHAZEER N, PARMAR N, USZKOREIT J, JONES L, GOMEZ A N, KAISER Ł, POLOSUKHIN I: Attention is all you need. Advances in Neural Information Processing Systems 30: 1–11 (2017)

WALKER H L, GHANI S, KUEMMERLI C, NEBIKER C A, MÜLLER B P, RAPTIS D A, STAUBLI S M: Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. J Med Internet Res 25: e47479 (2023) DOI: https://doi.org/10.2196/47479

Downloads

Published

2023-10-04

How to Cite

Fuchs, A., Trachsel, T., Weiger, R., & Eggmann, F. (2024). ChatGPT’s performance in dentistry and allergyimmunology assessments: a comparative study. SWISS DENTAL JOURNAL SSO – Science and Clinical Topics, 134(2), 1–17. https://doi.org/10.61872/sdj-2024-06-01