Evaluating Artificial Intelligence in Orthopaedics: A Pilot Study on Accuracy and Reliability in Medical Student Competency Tests
DOI:
https://doi.org/10.23886/ejki.14.1229.1Keywords:
orthopedics, reliability, UKMPPD, artificial intelligence, medical educationAbstract
Artificial intelligence (AI) is increasingly recognized as a valuable tool in medical education, yet its effectiveness across platforms remains underexplored. This study evaluated the performance of nine AI models-ChatGPT-4o, ChatGPT Mini, Gemini, Gemini Advanced, Perplexity, Perplexity Pro, Ortho Research Pro, Ortho AI, and Claude- in answering 30 expert-validated multiple-choice questions (MCQs) from the orthopaedics section of the UKMPPD. All models were evaluated concurrently between August and September 2024 using their official web interfaces. Each model was tested five times to assess accuracy and consistency. Statistical analysis was conducted using SPSS version 30.0. Normality and homogeneity were assessed using the Shapiro-Wilk and Levene’s tests. Accuracy differences were analyzed using oneway ANOVA followed by Tukey’s HSD post hoc test (p < 0.05). Reliability was evaluated using the intraclass correlation coefficient (ICC). Gemini demonstrated the highest mean accuracy (89 ± 2.79%), while ChatGPT Mini had the lowest (66 ± 3.33%). Significant differences in accuracy were observed (p < 0.05), with Gemini differing only from ChatGPT Mini and Perplexity. Most models demonstrated excellent reliability (ICC > 0.90), with Ortho Res. Pro among the best (ICC 0.956; 95% CI 0.925–0.977), while the remaining models showed good reliability. Overall, AI models showed strong potential for supporting medical exam preparation, although performance varied across platforms.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Domy Pradana Putra, Farhan Ardiyanta Setyagisna, Heksa Trisnawati

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Accepted 2026-05-04
Published 2026-06-28



