Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study

Saglam, Soenmez; Uludag, Veysel; Karaduman, Zekeriya Okan; Arican, Mehmet; Yucel, Muecahid Osman; Dalaslan, Rasit Emin

Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study

dc.authorid	Dalaslan, Rasit Emin/0000-0001-5068-8024
dc.authorid	Saglam, Sonmez/0000-0003-2651-8003;
dc.contributor.author	Saglam, Soenmez
dc.contributor.author	Uludag, Veysel
dc.contributor.author	Karaduman, Zekeriya Okan
dc.contributor.author	Arican, Mehmet
dc.contributor.author	Yucel, Muecahid Osman
dc.contributor.author	Dalaslan, Rasit Emin
dc.date.accessioned	2025-10-11T20:48:09Z
dc.date.available	2025-10-11T20:48:09Z
dc.date.issued	2025
dc.department	Düzce Üniversitesi	en_US
dc.description.abstract	Background The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly in clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential in various medical applications, including diagnostics and treatment planning. However, their efficacy in specialized fields like sports surgery and physiotherapy remains underexplored. This study aims to compare the performance of GPT-4 and GPT-3.5 in clinical decision-making within these domains using a structured assessment approach. Methods This cross-sectional study included 56 professionals specializing in sports surgery and physiotherapy. Participants evaluated 10 standardized clinical scenarios generated by GPT-4 and GPT-3.5 using a 5-point Likert scale. The scenarios encompassed common musculoskeletal conditions, and assessments focused on diagnostic accuracy, treatment appropriateness, surgical technique detailing, and rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis included paired t-tests for direct model comparisons, one-way ANOVA to assess performance across multiple criteria, and Cronbach's alpha to evaluate inter-rater reliability. Results GPT-4 significantly outperformed GPT-3.5 across all evaluated criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that GPT-4 provided more accurate diagnoses, superior treatment plans, and more detailed surgical recommendations. ANOVA results confirmed the higher suitability of GPT-4 in treatment planning (F(1, 55) = 35.22, p < 0.001) and rehabilitation protocols (F(1, 55) = 32.10, p < 0.001). Cronbach's alpha values indicated higher internal consistency for GPT-4 (alpha = 0.478) compared to GPT-3.5 (alpha = 0.234), reflecting more reliable performance. Conclusions GPT-4 demonstrates superior performance compared to GPT-3.5 in clinical decision-making for sports surgery and physiotherapy. These findings suggest that advanced AI models can aid in diagnostic accuracy, treatment planning, and rehabilitation strategies. However, AI should function as a decision-support tool rather than a substitute for expert clinical judgment. Future studies should explore the integration of AI into real-world clinical workflows, validate findings using larger datasets, and compare additional AI models beyond the GPT series.	en_US
dc.identifier.doi	10.1186/s12911-025-02996-8
dc.identifier.issn	1472-6947
dc.identifier.issue	1	en_US
dc.identifier.pmid	40229819	en_US
dc.identifier.scopus	2-s2.0-105003614578	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.uri	https://doi.org/10.1186/s12911-025-02996-8
dc.identifier.uri	https://hdl.handle.net/20.500.12684/21749
dc.identifier.volume	25	en_US
dc.identifier.wos	WOS:001466689200002	en_US
dc.identifier.wosquality	Q2	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.indekslendigikaynak	PubMed	en_US
dc.language.iso	en	en_US
dc.publisher	Bmc	en_US
dc.relation.ispartof	Bmc Medical Informaticsand Decision Making	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.snmz	KA_WOS_20250911
dc.subject	Artificial intelligence	en_US
dc.subject	Large language models	en_US
dc.subject	Sports surgery	en_US
dc.subject	Physiotherapy	en_US
dc.subject	Clinical decision-making	en_US
dc.subject	Rehabilitation	en_US
dc.subject	Musculoskeletal disorders	en_US
dc.title	Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study

Dosyalar

Koleksiyon