Comparison of fast regression algorithms in large datasets

dc.authoridAnkaralı, Handan/0000-0002-3613-0523en_US
dc.authorwosidAnkaralı, Handan/HTN-2985-2023en_US
dc.contributor.authorCangur, Sengul
dc.contributor.authorAnkarali, Handan
dc.date.accessioned2024-08-23T16:07:20Z
dc.date.available2024-08-23T16:07:20Z
dc.date.issued2023en_US
dc.departmentDüzce Üniversitesien_US
dc.description.abstractThe aim is to compare the performances of fast regression methods, namely dimensional reduction of correlation matrix (DRCM), nonparametric dimensional reduction of correlation matrix (N-DRCM), variance inflation factor (VIF) regression, and robust VIF (R-VIF) regression in the presence of mul-ticollinearity and outliers problems. In all simulation-scenarios, all the target variables were chosen for final models using four methods. The DRCM and N-DRCM are the methods that reach the final model in the shortest time, respectively. The time to reach the final model using R-VIF regression was approxi-mately twice shorter than that of VIF regression. In each method, as the number of variables and the level of outliers increased, the time taken to reach the final model increased. When the level of multicollinear-ity and the number of variables (p > 500) increased, the times to reach the final models using DRCM in datasets with outliers were slightly shorter than the those of N-DRCM. The largest numbers of noise variables were selected to the model using DRCM and N-DRCM, but the least number of them were selected to the model using the R-VIF regression. The RMSE values obtained using DRCM, N-DRCM and VIF regression were similar in each scenario. As a result of the real dataset, the final model selected using R-VIF regression had the highest R-2. It also had the lowest RMSE value among those obtained with other approaches excluding VIF regression. As such, the R-VIF regression method demonstrated a better performance than the others in all datasets.en_US
dc.identifier.issn2307-4108
dc.identifier.issn2307-4116
dc.identifier.issue2en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12684/14601
dc.identifier.volume50en_US
dc.identifier.wosWOS:000957969300017en_US
dc.identifier.wosqualityQ3en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.language.isoenen_US
dc.publisherAcademic Publication Councilen_US
dc.relation.ispartofKuwait Journal of Scienceen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDimensional reductionen_US
dc.subjectlarge dataen_US
dc.subjectrobusten_US
dc.subjectvariance inflation factoren_US
dc.subjectVariable Selectionen_US
dc.subjectVif Regressionen_US
dc.titleComparison of fast regression algorithms in large datasetsen_US
dc.typeArticleen_US

Dosyalar