Benchmarking the Clustering Performances of Evolutionary Algorithms: A Case Study on Varying Data Size
Yükleniyor...
Dosyalar
Tarih
2020
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Elsevier Science Inc
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Background and objective: Clustering is a widely used popular method for data analysis within many clustering algorithms for years. Today it is used in many predictions, collaborative filtering and automatic segmentation systems on different domains. Also, to be broadly used in practice, such clustering algorithms need to give both better performance and robustness when compared to the ones currently used. In recent years, evolutionary algorithms are used in many domains since they are robust and easy to implement. And many clustering problems can be easily solved with such algorithms if the problem is modeled as an optimization problem. In this paper, we present an optimization approach for clustering by using four well-known evolutionary algorithms which are Biogeography-Based Optimization (BBO), Grey Wolf Optimization (GWO), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Method: the objective function has been specified to minimize the total distance from cluster centers to the data points. Euclidean distance is used for distance calculation. We have applied this objective function to the given algorithms both to find the most efficient clustering algorithm and to compare the clustering performances of algorithms against different data sizes. In order to benchmark the clustering performances of algorithms in the experiments, we have used a number of datasets with different data sizes such as some small scale, medium and big data. The clustering performances have been compared to K-means as it is a widely used clustering algorithm for years in literature. Rand Index, Adjusted Rand Index, Mirkin's Index and Hubert's Index have been considered as parameters for evaluating the clustering performances. Result: As a result of the clustering experiments of algorithms over different datasets with varying data sizes according to the specified performance criteria, GA and GWO algorithms show better clustering performances among the others. Conclusions: The results of the study showed that although the algorithms have shown satisfactory clustering results on small and medium scale datasets, the clustering performances on Big data need to be improved. (C) 2020 AGBM. Published by Elsevier Masson SAS. All rights reserved.
Açıklama
Anahtar Kelimeler
Clustering, Optimization, Evolutionary algorithms, PSO, GWO, BBO, K-means, Genetic Algorithm, Big Data
Kaynak
Irbm
WoS Q DeÄŸeri
Q4
Scopus Q DeÄŸeri
Q2
Cilt
41
Sayı
5