Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology

dc.contributor.authorSav, Nadide Melike
dc.date.accessioned2025-10-11T20:48:46Z
dc.date.available2025-10-11T20:48:46Z
dc.date.issued2025
dc.departmentDüzce Üniversitesien_US
dc.description.abstractBackground Artificial intelligence (AI) has emerged as a transformative tool in healthcare, offering significant advancements in providing accurate clinical information. However, the performance and applicability of AI models in specialized fields such as pediatric nephrology remain underexplored. This study is aimed at evaluating the ability of two AI-based language models, GPT-3.5 and GPT-4, to provide accurate and reliable clinical information in pediatric nephrology. The models were evaluated on four criteria: accuracy, scope, patient friendliness, and clinical applicability. Methods Forty pediatric nephrology specialists with >= 5 years of experience rated GPT-3.5 and GPT-4 responses to 10 clinical questions using a 1-5 scale via Google Forms. Ethical approval was obtained, and informed consent was secured from all participants. Results Both GPT-3.5 and GPT-4 demonstrated comparable performance across all criteria, with no statistically significant differences observed (p > 0.05). GPT-4 exhibited slightly higher mean scores in all parameters, but the differences were negligible (Cohen's d < 0.1 for all criteria). Reliability analysis revealed low internal consistency for both models (Cronbach's alpha ranged between 0.019 and 0.162). Correlation analysis indicated no significant relationship between participants' years of professional experience and their evaluations of GPT-3.5 (correlation coefficients ranged from - 0.026 to 0.074). Conclusions While GPT-3.5 and GPT-4 provided a foundational level of clinical information support, neither model exhibited superior performance in addressing the unique challenges of pediatric nephrology. The findings highlight the need for domain-specific training and integration of updated clinical guidelines to enhance the applicability and reliability of AI models in specialized fields. This study underscores the potential of AI in pediatric nephrology while emphasizing the importance of human oversight and the need for further refinements in AI applications.en_US
dc.description.sponsorshipScientific and Technological Research Council of Turkiye (TUBITAK)en_US
dc.description.sponsorshipOpen access funding provided by the Scientific and Technological Research Council of Turkiye (TUBITAK).en_US
dc.identifier.doi10.1007/s00467-025-06723-3
dc.identifier.endpage2885en_US
dc.identifier.issn0931-041X
dc.identifier.issn1432-198X
dc.identifier.issue9en_US
dc.identifier.pmid40045013en_US
dc.identifier.scopus2-s2.0-86000316527en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage2879en_US
dc.identifier.urihttps://doi.org/10.1007/s00467-025-06723-3
dc.identifier.urihttps://hdl.handle.net/20.500.12684/22093
dc.identifier.volume40en_US
dc.identifier.wosWOS:001438360200001en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.institutionauthorSav, Nadide Melike
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.relation.ispartofPediatric Nephrologyen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.snmzKA_WOS_20250911
dc.subjectArtificial intelligenceen_US
dc.subjectChatGPTen_US
dc.subjectClinical decision support systemsen_US
dc.subjectCohen's den_US
dc.subjectCronbach's alphaen_US
dc.subjectPediatric nephrologyen_US
dc.titleAssessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrologyen_US
dc.typeArticleen_US

Dosyalar