Dağıtık hadoop kümelerinde yeni eşle/indirge programlama algoritması modeli

Şeşen, Emin

Dağıtık hadoop kümelerinde yeni eşle/indirge programlama algoritması modeli

dc.contributor.advisor	Kara, Resul
dc.contributor.author	Şeşen, Emin
dc.date.accessioned	2025-10-11T20:34:42Z
dc.date.available	2025-10-11T20:34:42Z
dc.date.issued	2024
dc.department	DÜ, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.description.abstract	Büyük veriler veri aktarma maliyetinden dolayı genellikle üretildiği konumlara yakın yerlere depolanırlar. Depolanan bu veriler işlenmek için tek bir konuma taşınır veya bulunduğu konumda işlenirler. Literatürde veri işlemek için farklı yöntemlere rastlamak mümkündür. Bu çalışmada veri işlemek için yeni bir yöntem sunulmuştur. Önerilen yöntemde veri işleme sürecini tamamlayan farklı donanımlara sahip veri merkezlerinin (DC) kendi aralarında veri karıştırma (shuffling) yapması sağlanmıştır. DC'lerin indirge (reduce) fonksiyonunun veri işleme maliyetinin hesaplanması için test ortamında elde edilen veriler ile polinomal regresyon modeli oluşturulmuş ve karar sürecinde bu modelden elde edilen katsayılar kullanılmıştır. Karıştırma yapılacak anahtar/değer çiftlerini, konumlarını dikkate alarak, DC'lerin maliyetlerine göre dağıtılmıştır. DC'ler arasında karıştırma için, DC'lerin tümünün işini bitirmesi beklenmez. Böylelikle tüm DC'lerin aynı anda karıştırma yaptıklarındaki hem karıştırma hem de işlenen veri hacmi azalmıştır. Önerilen yöntemin performansı literatürdeki 4 farklı yöntemle karşılaştırılmıştır. Sonuç olarak bu çalışma veri boyutunda en yakın örneğinden %15 daha az karıştırma verisi oluşturmuştur.	en_US
dc.description.abstract	Big data are often stored close to the locations where they are generated, owing to the cost of data transfer. These stored data are moved to a single location for processing or processed at that location. In the literature, it is possible to find different methods for processing data in distributed datacenters. In this study, we present a new method for data processing called GSelf-MapReduce. In the proposed method, shuffling is performed among heterogeneous datacenter (DC) that complete the data-processing process. To calculate the data processing cost of the reduced function of the DCs, a polynomial regression model was created using the data obtained in the test environment, and the coefficients obtained from this model were used in the decision process. The key/value pairs to be shuffled are distributed according to the cost of the DCs, considering their location. Because the data to be shuffled between DCs do not wait for all DCs to complete their jobs, the cost is reduced both in terms of the data to be moved and the data to be processed. The performance of the proposed method was compared with that of four different distributed data processing methods in the literature. As a result, this work generates 15% less shuffled data than the closest work.	en_US
dc.identifier.endpage	85	en_US
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=usXiZIM9Lp0wk-YzRoaT-xueVoHgg5WHArMrp55xYCvb_ijPNyffgu1FYZCCTzq1
dc.identifier.uri	https://hdl.handle.net/20.500.12684/20294
dc.identifier.yoktezid	887213	en_US
dc.institutionauthor	Şeşen, Emin
dc.language.iso	tr	en_US
dc.publisher	Düzce Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.snmz	KA_TEZ_20250911
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	en_US
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Dağıtık hadoop kümelerinde yeni eşle/indirge programlama algoritması modeli	en_US
dc.title.alternative	New map/reduce programming algorithm model in distributed hadoop clusters	en_US
dc.type	Doctoral Thesis	en_US

Koleksiyon

Fen Bilimleri Enstitüsü Tez Koleksiyonu

Dağıtık hadoop kümelerinde yeni eşle/indirge programlama algoritması modeli

Dosyalar

Koleksiyon