Benchmarking the Clustering Performances of Evolutionary Algorithms: A Case Study on Varying Data Size

Yükleniyor...
Küçük Resim

Tarih

2020

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier Science Inc

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Background and objective: Clustering is a widely used popular method for data analysis within many clustering algorithms for years. Today it is used in many predictions, collaborative filtering and automatic segmentation systems on different domains. Also, to be broadly used in practice, such clustering algorithms need to give both better performance and robustness when compared to the ones currently used. In recent years, evolutionary algorithms are used in many domains since they are robust and easy to implement. And many clustering problems can be easily solved with such algorithms if the problem is modeled as an optimization problem. In this paper, we present an optimization approach for clustering by using four well-known evolutionary algorithms which are Biogeography-Based Optimization (BBO), Grey Wolf Optimization (GWO), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Method: the objective function has been specified to minimize the total distance from cluster centers to the data points. Euclidean distance is used for distance calculation. We have applied this objective function to the given algorithms both to find the most efficient clustering algorithm and to compare the clustering performances of algorithms against different data sizes. In order to benchmark the clustering performances of algorithms in the experiments, we have used a number of datasets with different data sizes such as some small scale, medium and big data. The clustering performances have been compared to K-means as it is a widely used clustering algorithm for years in literature. Rand Index, Adjusted Rand Index, Mirkin's Index and Hubert's Index have been considered as parameters for evaluating the clustering performances. Result: As a result of the clustering experiments of algorithms over different datasets with varying data sizes according to the specified performance criteria, GA and GWO algorithms show better clustering performances among the others. Conclusions: The results of the study showed that although the algorithms have shown satisfactory clustering results on small and medium scale datasets, the clustering performances on Big data need to be improved. (C) 2020 AGBM. Published by Elsevier Masson SAS. All rights reserved.

Açıklama

Anahtar Kelimeler

Clustering, Optimization, Evolutionary algorithms, PSO, GWO, BBO, K-means, Genetic Algorithm, Big Data

Kaynak

Irbm

WoS Q DeÄŸeri

Q4

Scopus Q DeÄŸeri

Q2

Cilt

41

Sayı

5

Künye