Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study

dc.authoridDalaslan, Rasit Emin/0000-0001-5068-8024
dc.authoridSaglam, Sonmez/0000-0003-2651-8003;
dc.contributor.authorSaglam, Soenmez
dc.contributor.authorUludag, Veysel
dc.contributor.authorKaraduman, Zekeriya Okan
dc.contributor.authorArican, Mehmet
dc.contributor.authorYucel, Muecahid Osman
dc.contributor.authorDalaslan, Rasit Emin
dc.date.accessioned2025-10-11T20:48:09Z
dc.date.available2025-10-11T20:48:09Z
dc.date.issued2025
dc.departmentDüzce Üniversitesien_US
dc.description.abstractBackground The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly in clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential in various medical applications, including diagnostics and treatment planning. However, their efficacy in specialized fields like sports surgery and physiotherapy remains underexplored. This study aims to compare the performance of GPT-4 and GPT-3.5 in clinical decision-making within these domains using a structured assessment approach. Methods This cross-sectional study included 56 professionals specializing in sports surgery and physiotherapy. Participants evaluated 10 standardized clinical scenarios generated by GPT-4 and GPT-3.5 using a 5-point Likert scale. The scenarios encompassed common musculoskeletal conditions, and assessments focused on diagnostic accuracy, treatment appropriateness, surgical technique detailing, and rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis included paired t-tests for direct model comparisons, one-way ANOVA to assess performance across multiple criteria, and Cronbach's alpha to evaluate inter-rater reliability. Results GPT-4 significantly outperformed GPT-3.5 across all evaluated criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that GPT-4 provided more accurate diagnoses, superior treatment plans, and more detailed surgical recommendations. ANOVA results confirmed the higher suitability of GPT-4 in treatment planning (F(1, 55) = 35.22, p < 0.001) and rehabilitation protocols (F(1, 55) = 32.10, p < 0.001). Cronbach's alpha values indicated higher internal consistency for GPT-4 (alpha = 0.478) compared to GPT-3.5 (alpha = 0.234), reflecting more reliable performance. Conclusions GPT-4 demonstrates superior performance compared to GPT-3.5 in clinical decision-making for sports surgery and physiotherapy. These findings suggest that advanced AI models can aid in diagnostic accuracy, treatment planning, and rehabilitation strategies. However, AI should function as a decision-support tool rather than a substitute for expert clinical judgment. Future studies should explore the integration of AI into real-world clinical workflows, validate findings using larger datasets, and compare additional AI models beyond the GPT series.en_US
dc.identifier.doi10.1186/s12911-025-02996-8
dc.identifier.issn1472-6947
dc.identifier.issue1en_US
dc.identifier.pmid40229819en_US
dc.identifier.scopus2-s2.0-105003614578en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1186/s12911-025-02996-8
dc.identifier.urihttps://hdl.handle.net/20.500.12684/21749
dc.identifier.volume25en_US
dc.identifier.wosWOS:001466689200002en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.language.isoenen_US
dc.publisherBmcen_US
dc.relation.ispartofBmc Medical Informaticsand Decision Makingen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.snmzKA_WOS_20250911
dc.subjectArtificial intelligenceen_US
dc.subjectLarge language modelsen_US
dc.subjectSports surgeryen_US
dc.subjectPhysiotherapyen_US
dc.subjectClinical decision-makingen_US
dc.subjectRehabilitationen_US
dc.subjectMusculoskeletal disordersen_US
dc.titleComparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional studyen_US
dc.typeArticleen_US

Dosyalar