Evaluating Artificial Intelligence in Orthopaedics: A Pilot Study on Accuracy and Reliability in Medical Student Competency Tests

Domy Pradana Putra; Farhan Ardiyanta Setyagisna; Heksa Trisnawati

doi:10.23886/ejki.14.1229.1

Authors

Domy Pradana Putra Department of Orthopaedic and Traumatology, Faculty of Medicine, Universitas Brawijaya – Dr. Saiful Anwar General Hospital
Farhan Ardiyanta Setyagisna Faculty of Medicine Universitas Brawijaya, Malang, Indonesia
Heksa Trisnawati Faculty of Medicine Universitas Brawijaya, Malang, Indonesia

DOI:

https://doi.org/10.23886/ejki.14.1229.1

Keywords:

orthopedics, reliability, UKMPPD, artificial intelligence, medical education

Abstract

Artificial intelligence (AI) is increasingly recognized as a valuable tool in medical education, yet its effectiveness across platforms remains underexplored. This study evaluated the performance of nine AI models-ChatGPT-4o, ChatGPT Mini, Gemini, Gemini Advanced, Perplexity, Perplexity Pro, Ortho Research Pro, Ortho AI, and Claude- in answering 30 expert-validated multiple-choice questions (MCQs) from the orthopaedics section of the UKMPPD. All models were evaluated concurrently between August and September 2024 using their official web interfaces. Each model was tested five times to assess accuracy and consistency. Statistical analysis was conducted using SPSS version 30.0. Normality and homogeneity were assessed using the Shapiro-Wilk and Levene’s tests. Accuracy differences were analyzed using oneway ANOVA followed by Tukey’s HSD post hoc test (p < 0.05). Reliability was evaluated using the intraclass correlation coefficient (ICC). Gemini demonstrated the highest mean accuracy (89 ± 2.79%), while ChatGPT Mini had the lowest (66 ± 3.33%). Significant differences in accuracy were observed (p < 0.05), with Gemini differing only from ChatGPT Mini and Perplexity. Most models demonstrated excellent reliability (ICC > 0.90), with Ortho Res. Pro among the best (ICC 0.956; 95% CI 0.925–0.977), while the remaining models showed good reliability. Overall, AI models showed strong potential for supporting medical exam preparation, although performance varied across platforms.

Downloads

Download data is not yet available.

Evaluating Artificial Intelligence in Orthopaedics: A Pilot Study on Accuracy and Reliability in Medical Student Competency Tests

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Download

Abstracted & Indexed by:

AKREDITASI

Information

Make a Submission