DSpace Arşivi :: by Yazar "Başarslan, Muhammet Sinan" değerine göre listeleniyor

Yazar "Başarslan, Muhammet Sinan" seçeneğine göre listele

Listeleniyor 1 - 11 / 11

Classification of a bank data set on various data mining platforms
(Institute of Electrical and Electronics Engineers Inc., 2018) Başarslan, Muhammet Sinan; Argun, İrem Düzdar
The process of extracting meaningful rules from big and complex data is called data mining. Data mining has an increasing popularity in every field today. Data units are established in customer-oriented industries such as marketing, finance and telecommunication to work on the customer churn and acquisition, in particular. Among the data mining methods, classification algorithms are used in studies conducted for customer acquisition to predict the potential customers of the company in question in the related industry. In this study, bank marketing data set in UCI Machine Learning Data Set was used by creating models with the same classification algorithms in different data mining programs. Accuracy, precision and f- measure criteria were used to test performances of the classification models. When creating the classification models, the test and training data sets were randomly divided by the holdout method to evaluate the performance of the data set. The data set was divided into training and test data sets with the 60-40%, 75, 25% and 80-20% separation ratios. Data mining programs used for these processes are the R, Knime, RapidMiner and WEKA. And, classification algorithms commonly used in these platforms are the k-nearest neighbor (k-nn), Naive Bayes, and C4.5 decision tree. © 2018 IEEE.
Data Mining Through Data Visualization: A Case Study on Predicting Churners on Telecomunications Data Set
(2018) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Data mining is the process of extracting meaningfulinformation from a large, raw data. These processes are carriedout by various, detailed methods. And, the obtained results areused to make various interpretations and to draw conclusions.Deductions can either be made by interpreting the data aftervarious operations or by plotting the data in various forms ofgraphs. This type of interpretation over graphics is called datamining through data visualization. Generating graphs that can beused to draw various conclusions on a telecommunications dataset with the help of some packages included in the R program ispresented in the paper. It does not require upper-level math skillsto interpret these graphics; and everyone having knowledgeabout the industry and data set of the graphs has the ability toplot similar graphs and make analysis and interpretationsregarding the results obtained on the data set at hand. In thisstudy, R language was preferred as the software infrastructurefor data mining applications, and graphs were plotted forinterpretation through data visualization with data mining
A Hybrid Classification Example in the Diagnosis of Skin Disease with Cryotherapy and Immunotherapy Treatment
(Ieee, 2018) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Uncontrolled tumors in the human body are called cancer. Unbalanced diet, alcohol and cigarette use, food additives and a variety of viruses can cause people have cancer. Cancer-causing tumors can be good or malignant. This study will measure the responses to treatments for skin disease caused by human papilloma virus (HPV), also called wart virus, which is directly related to cancer. This virus is an infectious virus that can infect another person by contact. There are multiple types of HPV virus and although it is usually benign, it can cause cancers such as cervical cancer, skin cancer. Apart from cancer, warts caused by HPV virus are generally seen on hands, feet, face and genital areas. As the skin grows and sagging progresses, it causes cancer at advanced levels. As a treatment method; drug use, surgical removal and HPV virus vaccination are used. These methods may require various surgical interventions. It can also cause a variety of reactions to allergic patients or it can cause a slight dependence on drug use. In addition to these methods, cryotherapy (ice treatment) and immunotherapy methods are used which are developed to obtain faster results and less costly than drugs and surgical interventions. In this study, it was estimated that 180 patients with warts on hands and feet who applied to the dermatology clinic of Ghaem Hospital in Iran were divided into two groups and responded to the treatment with two separate data sets obtained by applying cryotherapy in the other half and immunotherapy treatment in the other half. These data sets are located in the UCI data set. Navie Bayes, C4.5 decision tree, logistic regression, k-nearest neighbor classifier models have been developed for estimation work. In addition, the classification of the features included in the immunotherapy and cryotherapy data sets were tested by applying the feature selection process. The performance of the data sets after attribute selection and the performance of the raw data sets in the classification models are compared. 5 and 10 times cross validation is used to compare the performance of these models. The study also gave the best performance in all the performance criteria of the 4 different classifiers in the two datasets that are used as common models with the C4.5 Decision Tree. In addition, it is clearly seen that the attribute selection process has increased the performance criteria of all models.
MBi-GRUMCONV: A novel Multi Bi-GRU and Multi CNN-Based deep learning model for social media sentiment analysis
(Springer, 2023) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Today, internet and social media is used by many people, both for communication and for expressing opinions about various topics in many domains of life. Various artificial intelligence technologies-based approaches on analysis of these opinions have emerged natural language processing in the name of different tasks. One of these tasks is Sentiment analysis, which is a popular method aiming the task of analyzing people's opinions which provides a powerful tool in making decisions for people, companies, governments, and researchers. It is desired to investigate the effect of using multi-layered and different neural networks together on the performance of the model to be developed in the sentiment analysis task. In this study, a new, deep learning-based model was proposed for sentiment analysis on IMDB movie reviews dataset. This model performs sentiment classification on vectorized reviews using two methods of Word2Vec, namely, the Skip Gram and Continuous Bag of Words, in three different vector sizes (100, 200, 300), with the help of 6 Bidirectional Gated Recurrent Units and 2 Convolution layers (MBi-GRUMCONV). In the experiments conducted with the proposed model, the dataset was split into 80%-20% and 70%-30% training-test sets, and 10% of the training splits were used for validation purposes. Accuracy and F1 score criteria were used to evaluate the classification performance. The 95.34% accuracy of the proposed model has outperformed the studies in the literature. As a result of the experiments, it was found that Skip Gram has a better contribution to classification success.
Open Source Data Mining Programs: A Case Study on R
(2018) Kayaalp, Fatih; Başarslan, Muhammet Sinan
The processes on the way from raw data to meaningful information is called data mining. The data is processed by applying various methods of data mining in order to extract hidden information among raw data. The processed raw data becomes usable in the next steps of data mining. There are many open source and commercial applications to be used in data mining and data processing. In this study, information about data mining programs are given, and a case study on the R program. The R program has been chosen because it has a large preference rate among the users as shown by various graphs.
Performance Analysis Of Fuzzy Rough Set-Based And Correlation-Based Attribute Selection Methods On Detection Of Chronic Kidney Disease With Various Classifiers
(Ieee, 2019) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Technological developments generally have positive effects on our daily lives especially on health domain. Diagnosing diseases through new machines or methods are easier than compared to the past. Benchmarking the effect of attribute selection methods on the performance of classification algorithms in a study to diganose the chronic kidney disease (CKD) by using classification algorithms are aimed. Data set on CKD taken from the UCI machine learning repository has been used for the experiments. After a variety of pre-processing, normalization and attribute selection processes, classifier models are designed. In order to determine the attributes that have gerater contribution on the classification results, the Correlation Based attribute selection (CBAS) method and Fuzzy Rough Set Based attribute selection (FRSBAS) method were used. Two data sets obtained by each attribute selection method and the raw data are classified by 4 classifiers including k-Nearest Neighbor, Navie Bayes, Random Forest and Logistic Regression. The test and training data are separated by 5-fold cross validation. The accuracy, precision, sensitivity, ROC curve and F-measure parameters obtained from confusion matrix are used to compare and evaluate the results of the models. As a result of the study, it is seen that the application of FRSBAS method on CKD data set performs better in all classification algorithms.
Sentiment Analysis on Social Media Reviews Datasets with Deep Learning Approach
(2021) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Thanks to social media, people are now able to leave guiding comments quickly about their favorite restaurants, movies, etc. This has paved the way for the field of sentiment analysis, which brings together various disciplines. In this study, Yelp restaurant reviews and IMDB movie reviews dataset were used together with the data collected from Twitter. Word2Vec (W2V), Global Vector (GloVe) and Bidirectional Encoder Representation (BERT) word embedding methods, Term Frequency-Reverse Document Frequency (TF-IDF), and the Bag-of-Words (BOW) were used on these datasets. Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), Support Vector Machine (SVM), and Naive Bayes (NB) were used in the sentiment analysis models. Accuracy, F-measure (F), Sensitivity (Sens), Precision (Pre), and Receiver Operating Characteristics (ROC) were used in the evaluation of the model performance. The Accuracy rates of the models created by the Machine Learning (ML) and Deep Learning (DL) methods using the IMDB dataset were in the range of 81%-90% and 84%-94%, respectively. These rates were in the range of 80%-86% and 81%-89% for the Yelp dataset, and in the range of 75%-79% and 85%-98% for the Twitter dataset. The models that incorporated the BERT word embedding method have the best performance, compared to the other models with ML and DL. Therefore, BERT method is recommended for this type of analysis in future studies.
Shiny ile R Uygulaması Geliştirme ve Müşteri Kayıp Analizine İlişkin Web Destekli Bir Örnek Çalışma
(2017) Kayaalp, Fatih; Başarslan, Muhammet Sinan
Veri madenciliğindeki problemlerin çözümünde çeşitli platformlar kullanılır. R programlama dili de bu platformlar içinde yaygın olarak kullanılanlardan biridir. R programlama dili açık kaynak kodlu olması ve sunduğu topluluk desteği sayesinde araştırmacıların ilgi odağındadır. Shiny, R kodlarının bir web arayüzü aracılığıyla paylaşılmasını sağlayan bir R paketidir. Müşteri Kayıp Analizi, hizmet almakta olan müşterilerden hangilerinin hizmet alımını durdurmayı düşündüğünü tahmin etmek amaçlı olarak kullanılan bir analizdir. Özellikle abonelik temelli sektörlerden olan haberleşme, ödemeli TV gibi sektörlerde müşteri analizlerinde yaygın olarak kullanılmaktadır. Bu çalışmada, telekomünikasyon sektöründeki müşteri veri seti üzerinde C4.5 karar ağacı algoritması kullanılarak R paketlerinden biri olan Shiny vasıtasıyla web destekli olarak gerçekleştirilmiş olan müşteri kayıp analizi sunulmaktadır. Sunulan çalışma, uygulamanın gerçek hayattan alınmış bir veri seti üzerinde yapılmış olması, bu analizin yerel bilgisayarda yapılması yerine erişim izni olan kullanıcılar tarafından web üzerinden erişilerek uzaktan yapılabilmesine imkan sağlaması ve telekomünikasyon alanında müşteri kayıp analizi hakkında daha önceden R ile web tabanlı olarak yapılmamış olması özelliklerinden dolayı özgünlük taşımaktadır
Sosyal Ağlarda Duygu Analizi için Hibrit Bir Yöntem Geliştirilmesi
(Düzce Üniversitesi, 2023) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Teknolojinin gelişmesi, insanların sosyal hayatlarında birçok alışkanlığının değişimine sebep olmuştur. Pandeminin de etkisiyle eğitim başta olmak üzere birçok alanda yüz yüze iletişim oldukça azalmıştır. İnsanlar, sosyalleşmek için sosyal medya ve web sitelerine hızlıca erişim sağlayabilecekleri mobil cihazlar ile vakit geçirir hale gelmişlerdir. Pandemi sona ermesine rağmen, insanlar sosyalleşmek için web sitelerini ve sosyal medya uygulamalarını hala yaygın olarak kullanmaya devam edilmektedirler. İnsanlar birlikte izledikleri film, gittikleri restoran gibi birçok deneyimleri hakkında fikirlerini anlık şekilde paylaşmaktadırlar. Bu durum sosyal medya ve web siteleri aracılığıyla sürekli bir veri paylaşımına sebep olmaktadır. Bu veri paylaşımı çok büyük boyutta veri toplanmasına sebep olmaktadır. Bu veriler, pazarlamadan, reklama kadar birçok kurumu yakından ilgilendirmektedir. Bu kurumlar kendileri veya sektörlerine yönelik paylaşılan verileri kullanmak amacıyla çalışmalar yapmaktadırlar. Gelişen yapay zeka teknolojileriyle, bu çalışmalar firmalara katma değer kazandırmaktadır. Bu değeri kazandıran çalışma alanlarından biri de yapay zekanın alt dalı olan doğal dil işleme görevlerinden duygu analizidir. Bu çalışmada, IMDB web sayfasından toplanan veri kümesinde duygu analizi görevine yönelik deneysel çalışmalar gerçekleştirilmiştir. IMDB veri kümesi üzerinde Word2Vec kelime gömme yöntemi sonrası altı adet BiGRU ve sonrasına iki Evrişim Sinir ağı ile MBiGRUMCONV olarak isimlendirilen yeni bir yöntem önerilmiştir. Önerilen bu yöntemin en iyi doğruluk performansını verdiği %80-%20 eğitim-test ve %10 doğrulama kümesinde test sonuçları; %90,59, doğrulama kümesi sonuçları %94,67'dir. Ayrıca önerilen modelin 3, 5 ve 10 kat çapraz geçerleme sonuçları da alınmıştır. En iyi sonuç, 5-kat çapraz geçerleme ayrımında %90,67 doğruluk değeri ile alınmıştır. Tüm sonuçlar bir arada değerlendirildiğinde önerilen yöntemin literatür çalışmalarına kıyasla rekabetçi bir performans sergilediği görülmüştür. Çalışma kapsamında TripAdvisor, Rotten Tomatoes ve Twitterdan toplanan açık kaynak olarak sunulan veri kümeleri üzerinde farklı metin temsili yöntemleri sonrasında çeşitli derin öğrenme, makine öğrenmesi ve topluluk öğrenme yöntemleriyle modeller oluşturulmuştur. Yüksek model başarımı almak için gelecek çalışmalarda, hibrit model oluştururken; ön eğitimli BERT türevleri ile metin temsili çıkarıldıktan sonra çift yönlü sinir ağlarının kullanılması önerilmektedir.
Telekomünikasyon sektöründe müşteri kayıp analizi
(Düzce Üniversitesi, 2017) Başarslan, Muhammet Sinan; Kayaalp, Fatih
İnsanların ihtiyaçlarına göre tüketim tercihleri farklılıklar gösterir. Müşteriye yatırım yapan kurumlar da bu tercihleri öngöremezler. Özellikle müşteri odaklı kurumlar yeni müşteri kazanma ve eldeki müşteriyi memnun ederek müşteri kaybını önlemeye çalışırlar. Müşteri odaklı sektörlerden birisi olan Telekomünikasyon şirketleri de müşteri kazanmak ve mevcut müşterilerini kaybetmemek isterler. İşte bu noktada çeşitli yollar ile müşterilerinin kaybını tahmin etmeye yönelik çalışmalar yaparlar. Bu tez çalışmasında, veri madenciliği ve makine öğrenmesi yöntemlerinden olan sınıflandırma algoritmaları ile müşteri kayıp analizi yapılmıştır. Bu analiz yapılırken makine öğrenmesi süreci adımlarından olan veri madenciliği üzerine çapraz endüstri standart süreç modeli (CRISP) kullanılmıştır. Sınıflandırma algoritmaları ile elde edilen modellerin performansları çapraz geçerleme ve hold-out performans yöntemleri ile değerlendirilmiştir. Çapraz geçerleme katı olarak 4 kat, 5 kat ve 10 kat çapraz geçerleme kullanılmıştır. 4 kat, 5 kat ve 10 kat çapraz geçerleme ile performans değerlendirmesinde karar ağaçları algoritmaları ile kurulan modeller, diğer modellere göre daha iyi bir performans göstermiştir. En iyi performansı gösteren C4.5 karar ağacı algoritmasının performansı yaklaşık olarak 0.98'dir. C4.5 karar ağacından sonra sırasıyla ID3 ve gini karar ağaçları, k-en yakın komşu ve bayes algoritmaları ile oluşturan modeller gelmektedir. k-en yakın komşu algoritması karar ağaçlarından sonra gelse de performansı C4.5 karar ağacına yakındır. Hold-out yöntemi ile veri seti %60-%40, %75-%25, %80-%20 ayrım oranlarına sahip sırasıyla eğitim ve test veri setine ayrılmıştır. Bu veri setleri üzerinde yapılan performans değerlendirmelerinde ise k-kat çapraz geçerlemedeki gibi benzer sonuç veren C4.5 karar ağacı en iyi performansı göstermiştir. Sonrasında k-kat çapraz geçerleme performans yönteminde yakın değerlere sahip olduğu ID3 ve Gini karar ağaçlarını geçen k-en yakın komşu algoritması olmuştur. En son sırada ise bayes algoritması yer almaktadır. k-en yakın komşu algoritmasının ID3 ve Gini karar ağaçlarını geçmesi hold-out ile rastgele ayrımda daha iyi performans göstermesinden dolayıdır. Veri madenciliği programı olarak kullanılan R sayesinde veri görselleştirme üzerine de bir çalışma yapılmıştır. Bu çalışmalara ek olarak sınıflandırma algoritmalarından en iyi sonucu veren C4.5 Karar ağacı algoritması ile oluşturulan model R paketlerinden Shiny ile web uygulaması yapılarak dinamik hale getirilmiştir.
TSCBAS: A Novel Correlation Based Attribute Selection Method and Application on Telecommunications Churn Analysis
(Ieee, 2018) Kayaalp, Fatih; Başarslan, Muhammet Sinan; Polat, Kemal
Attribute selection has a significant effect on the performance of the machine learning studies by selecting the attributes having significant effect on result, reducing the number of attributes, and reducing the calculation cost. In this study, a new attribute selection method which is a combination of the Rcorrelation coefficient-based attribute selection (RCBAS) and the rho-correlation coefficient-based attribute selection (rho CBAS) called the Two-Stage Correlation-Based Attribute Selection (TSCBAS) is proposed to select significant attributes. The proposed attribute selection method has been applied to customer churn prediction on a telecommunications dataset for performance evaluation. The dataset used in the study includes real customer call records details for the years 2013 and 2014 obtained from a major telecommunications company in Turkey. Apart from the proposed attribute selection method, four different methods named Rcorrelation coefficient-based attribute selection, rho-correlation coefficient-based attribute selection, ReliefF, and Gain Ratio have been used for creating five datasets. After that, four classifier algorithms including Random Forest, C4.5 Decision Tree, Naive Bayes and AdaBoost. M1 have been applied. The obtained results have been compared according to the performance metrics comprising Accuracy (ACC), Sensitivity (TPR), Specificity (SPC), F-measure (F), AUC (area under the ROC curve), and run-time. The results of the comparisons show that the proposed attribute selection algorithm outperforms the state of the art methods on customer churn prediction.

Yazar "Başarslan, Muhammet Sinan" seçeneğine göre listele

Sayfa Başına Sonuç

Sıralama seçenekleri