GSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers

dc.authorid0000-0002-1284-3916
dc.authorid0000-0002-4416-6657
dc.contributor.authorŞeşen, Emin
dc.contributor.authorKırışoğlu, Serdar
dc.contributor.authorKara, Resul
dc.date.accessioned2025-09-03T20:43:15Z
dc.date.available2025-09-03T20:43:15Z
dc.date.issued2024
dc.departmentDÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractBig data are often stored close to the locations where they are generated, owing to the cost of data transfer. These stored data are moved to a single location for processing or processed at that location. In the literature, it is possible to find different methods for processing data in distributed data centers. In this study, we present a new method for data processing called GSelf-MapReduce. In the proposed method, shuffling is performed among heterogeneous data center (DC) that complete the data-processing process. To calculate the data processing cost of the reduce function of the DCs, a polynomial regression model was created using the data obtained in the test environment, and the coefficients obtained from this model were used in the decision process. The key/value pairs to be shuffled are distributed according to the cost of the DCs, considering their location. In addition, not all DCs are waited to finish their job for shuffling. DCs that complete their job perform shuffling among themselves. Thus, the keys are deduplicated between these DCs. The shuffling volume in the last phase and the total job completion time are reduced. The performance of the proposed method was compared with that of four different distributed data processing methods in the literature. As a result, this work generates 15% less shuffled data than the closest work.
dc.identifier.citationŞeşen, E., Kırışoğlu, S., Kara, R. "GSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers," in IEEE Access, vol. 12, pp. 159503-159518, 2024, doi: 10.1109/ACCESS.2024.3487936.
dc.identifier.doi10.1109/ACCESS.2024.3487936
dc.identifier.endpage159518
dc.identifier.scopus2-s2.0-85208272733
dc.identifier.scopusqualityQ1
dc.identifier.startpage159503
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2024.3487936
dc.identifier.urihttps://hdl.handle.net/20.500.12684/20204
dc.identifier.volume12
dc.identifier.wosWOS:001349764400001
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.institutionauthorŞeşen, Emin
dc.institutionauthorKırışoğlu, Serdar
dc.institutionauthorKara, Resul
dc.institutionauthorid0000-0002-1284-3916
dc.institutionauthorid0000-0002-4416-6657
dc.language.isoen
dc.publisherIEEE
dc.relation.ispartofIEEE Access
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectBig data
dc.subjectdata reduction
dc.subjectgeo-distributed mapreduce
dc.subjectheterogenous data centers
dc.titleGSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers
dc.typeArticle

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
emin-sesen.pdf
Boyut:
2.33 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: