Evaluating Artificial Intelligence in Orthopaedics: A Pilot Study on Accuracy and Reliability in Medical Student Competency Tests

Authors

  • Domy Pradana Putra Department of Orthopaedic and Traumatology, Faculty of Medicine, Universitas Brawijaya – Dr. Saiful Anwar General Hospital
  • Farhan Ardiyanta Setyagisna Faculty of Medicine Universitas Brawijaya, Malang, Indonesia
  • Heksa Trisnawati Faculty of Medicine Universitas Brawijaya, Malang, Indonesia

DOI:

https://doi.org/10.23886/ejki.14.1229.1

Keywords:

orthopedics, reliability, UKMPPD, artificial intelligence, medical education

Abstract

Artificial intelligence (AI) is increasingly recognized as a valuable tool in medical education, yet its effectiveness across platforms remains underexplored. This study evaluated the performance of nine AI models-ChatGPT-4o, ChatGPT Mini, Gemini, Gemini Advanced, Perplexity, Perplexity Pro, Ortho Research Pro, Ortho AI, and Claude- in answering 30 expert-validated multiple-choice questions (MCQs) from the orthopaedics section of the UKMPPD. All models were evaluated concurrently between August and September 2024 using their official web interfaces. Each model was tested five times to assess accuracy and consistency.  Statistical analysis was conducted using SPSS version 30.0. Normality and homogeneity were assessed using the Shapiro-Wilk and Levene’s tests. Accuracy differences were analyzed using oneway ANOVA followed by Tukey’s HSD post hoc test (p < 0.05). Reliability was evaluated using the intraclass correlation coefficient (ICC). Gemini demonstrated the highest mean accuracy (89 ± 2.79%), while ChatGPT Mini had the lowest (66 ± 3.33%). Significant differences in accuracy were observed (p < 0.05), with Gemini differing only from ChatGPT Mini and Perplexity. Most models demonstrated excellent reliability (ICC > 0.90), with Ortho Res. Pro among the best  (ICC 0.956; 95% CI 0.925–0.977), while the remaining models showed good reliability. Overall, AI models showed strong potential for supporting medical exam preparation, although performance varied across platforms.

Downloads

Download data is not yet available.

Published

2026-06-28

How to Cite

Putra, D. P., Setyagisna, F. A., & Trisnawati, H. (2026). Evaluating Artificial Intelligence in Orthopaedics: A Pilot Study on Accuracy and Reliability in Medical Student Competency Tests. EJournal Kedokteran Indonesia, 14(1), 1. https://doi.org/10.23886/ejki.14.1229.1
Received 2025-09-24
Accepted 2026-05-04
Published 2026-06-28