Cyclical hybrid imputation technique for missing values in data sets

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Nature Portfolio

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

The problem of missing data in data sets is the most important first step to be addressed in the preprocessing phase. Because incorrect imputation of missing data increases the error in the modeling phase and reduces the prediction performance of the model. When it comes to health, it is inevitable to choose models that show a higher success rate. In cases where there is missing data, the performance of machine learning models may differ depending on the amount of data contained in the data set. The presence of missing data and this high rate affects the accuracy and reliability of analysis and modeling studies because it will affect the complete amount of data in the data set. Estimating and filling in the missing data very precisely, close to its real value, will provide a significant visible performance increase in the modeling phase, which is the next stage. After imputing the missing data with an artificial intelligence model rather than a random method, it is obvious that the accuracy of the model trained with this data is higher than the model trained with data filled with classical filling methods such as mean and mode. In this study, we propose a new algorithm that has been tested on many datasets to address the problems caused by missing data imputation in the dataset. The algorithm aims to impute missing values more effectively by using row-based and column-based imputation techniques together and cyclically. The algorithm takes into account individual missing values using column-based imputation features and the overall data structure using row-based imputation features. The proposed algorithm achieved 100% accuracy with some row and column-based imputation techniques on 3 different datasets used in the study. Higher accuracy was achieved compared to other imputation techniques.

Açıklama

Anahtar Kelimeler

Artificial intelligence, Machine learning, Deep learning, Imputation, Missing values

Kaynak

Scientific Reports

WoS Q Değeri

Q1

Scopus Q Değeri

Q1

Cilt

15

Sayı

1

Künye