DSpace Arşivi :: by Yazar "Kayaalp, Fatih" değerine göre listeleniyor

Yazar "Kayaalp, Fatih" seçeneğine göre listele

Listeleniyor 1 - 20 / 20

Android İşletim Sistemi Temelli Oryantiring Oyunu Mobil Uygulaması
(Türkiye Bilişim Vakfı, 2018) Kayaalp, Fatih; Satılmış, Ayşe
Android işletim sistemi ve GPS özelliği olan telefonlarda Google maps servisleri yardımıyla kullanıcıların konumunu belirlemeyi esas alan bir uygulama gerçekleştirilmiştir. Uygulama “Oryantiring” isimli oyunu esas almaktadır. Oyundaki amaç, kullanıcının konumu belirlendikten sonra, oluşturulan rota üzerindeki bayrakları en kısa sürede toplamaktır. Uygulama Android Studio ile gerçekleştirilmiştir. Tarih, süre ve rota uzunlukları yerel SQLite veritabanında kayıt altına alınmıştır.
Data Mining Through Data Visualization: A Case Study on Predicting Churners on Telecomunications Data Set
(2018) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Data mining is the process of extracting meaningfulinformation from a large, raw data. These processes are carriedout by various, detailed methods. And, the obtained results areused to make various interpretations and to draw conclusions.Deductions can either be made by interpreting the data aftervarious operations or by plotting the data in various forms ofgraphs. This type of interpretation over graphics is called datamining through data visualization. Generating graphs that can beused to draw various conclusions on a telecommunications dataset with the help of some packages included in the R program ispresented in the paper. It does not require upper-level math skillsto interpret these graphics; and everyone having knowledgeabout the industry and data set of the graphs has the ability toplot similar graphs and make analysis and interpretationsregarding the results obtained on the data set at hand. In thisstudy, R language was preferred as the software infrastructurefor data mining applications, and graphs were plotted forinterpretation through data visualization with data mining
Farklı Sınıflandırma Algoritmaları ve Metin Temsil Yöntemlerinin Duygu Analizinde Performans Karşılaştırılması
(2021) Öğe, Batuhan Cem; Kayaalp, Fatih
Son yıllarda internete erişim imkanlarının artması ve kullanıcılardaki akıllı telefon kullanımının yaygınlaşması sebebiyle sosyal medya olarak adlandırılan ve insanların çeşitli konulardaki fikirlerini paylaştığı servisler çok yaygın olarak kullanılmaktadır. Sosyal medya verilerinin analiz edilmesiyle insanların farklı konulardaki duygularına dair anlamlı çıkarımlarda bulunulması anlamına gelen ve temelde bir sınıflandırma işlemi olan Duygu Analizi çalışmaları son yıllarda öne çıkan çalışma alanlarından biridir. Bu çalışmada, Python programlama dili içindeki kütüphaneler kullanılarak Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) ve Artificial Neural Network (ANN) gibi 6 adet sınıflandırma algoritmasının Duygu Analizi kapsamında, performans karşılaştırması yapılmıştır. Veri seti olarak, açık kaynaklı, IMDB sitesinde yer alan etiketli kullanıcı yorumları kullanılmıştır. Doğal Dil İşleme yöntemleri kullanılarak temizlenen veri setinin sayısal olarak temsil edilebilmesi için Bag of Words (BoW), TF-IDF, FastText ve Word2Vec metin temsil yöntemleri kullanılmıştır. Veri setinin eğitimi ve test edilmesi aşamasında k=5 olacak şekilde k-fold cross validation yöntemi kullanılmıştır. 6 farklı sınıflandırma yöntemi için elde edilen sonuçlar accuracy, precision, recall ve f1 score hesaplanarak ayrıntılı bir karşılaştırma yapılmış ve sonuçlar kaydedilmiştir. En yüksek accuracy değerleri olarak LR ve SVM sırasıyla BOW’da %86, TF-IDF’te %87, word2Vec’de %87 ve FastText’te %83 seviyelerinde benzer sonuçlar vermiştir.
A Hybrid Classification Example in the Diagnosis of Skin Disease with Cryotherapy and Immunotherapy Treatment
(Ieee, 2018) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Uncontrolled tumors in the human body are called cancer. Unbalanced diet, alcohol and cigarette use, food additives and a variety of viruses can cause people have cancer. Cancer-causing tumors can be good or malignant. This study will measure the responses to treatments for skin disease caused by human papilloma virus (HPV), also called wart virus, which is directly related to cancer. This virus is an infectious virus that can infect another person by contact. There are multiple types of HPV virus and although it is usually benign, it can cause cancers such as cervical cancer, skin cancer. Apart from cancer, warts caused by HPV virus are generally seen on hands, feet, face and genital areas. As the skin grows and sagging progresses, it causes cancer at advanced levels. As a treatment method; drug use, surgical removal and HPV virus vaccination are used. These methods may require various surgical interventions. It can also cause a variety of reactions to allergic patients or it can cause a slight dependence on drug use. In addition to these methods, cryotherapy (ice treatment) and immunotherapy methods are used which are developed to obtain faster results and less costly than drugs and surgical interventions. In this study, it was estimated that 180 patients with warts on hands and feet who applied to the dermatology clinic of Ghaem Hospital in Iran were divided into two groups and responded to the treatment with two separate data sets obtained by applying cryotherapy in the other half and immunotherapy treatment in the other half. These data sets are located in the UCI data set. Navie Bayes, C4.5 decision tree, logistic regression, k-nearest neighbor classifier models have been developed for estimation work. In addition, the classification of the features included in the immunotherapy and cryotherapy data sets were tested by applying the feature selection process. The performance of the data sets after attribute selection and the performance of the raw data sets in the classification models are compared. 5 and 10 times cross validation is used to compare the performance of these models. The study also gave the best performance in all the performance criteria of the 4 different classifiers in the two datasets that are used as common models with the C4.5 Decision Tree. In addition, it is clearly seen that the attribute selection process has increased the performance criteria of all models.
Leakage detection and localization on water transportation pipelines: a multi-label classification approach
(Springer London Ltd, 2017) Kayaalp, Fatih; Zengin, Ahmet; Kara, Resul; Zavrak, Sultan
One of the main problems of water transportation pipelines is leak which can cause water resources loss, possible human injuries, and damages to the environment. There are many studies in the literature focusing on detection and localization of leaks in the water pipeline systems. In this study, we have designed a wireless sensor network-based real-time monitoring system to detect and locate the leaks on multiple positions on water pipelines by using pressure data. At first, the pressure data are collected from wireless pressure sensor nodes. After that, unlike from the previous works in the literature, both the detection and localization of leakages are carried out by using multi-label learning methods. We have used three multi-label classification methods which are RAkELd, BRkNN, and BR with SVM. After the evaluation and comparison of the methods with each other, we observe that the RAkELd method performs best on almost all measures with the accuracy ratio of 98%. As a result, multi-label classification methods can be used on the detection and localization of the leaks in the pipeline systems successfully.
MBi-GRUMCONV: A novel Multi Bi-GRU and Multi CNN-Based deep learning model for social media sentiment analysis
(Springer, 2023) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Today, internet and social media is used by many people, both for communication and for expressing opinions about various topics in many domains of life. Various artificial intelligence technologies-based approaches on analysis of these opinions have emerged natural language processing in the name of different tasks. One of these tasks is Sentiment analysis, which is a popular method aiming the task of analyzing people's opinions which provides a powerful tool in making decisions for people, companies, governments, and researchers. It is desired to investigate the effect of using multi-layered and different neural networks together on the performance of the model to be developed in the sentiment analysis task. In this study, a new, deep learning-based model was proposed for sentiment analysis on IMDB movie reviews dataset. This model performs sentiment classification on vectorized reviews using two methods of Word2Vec, namely, the Skip Gram and Continuous Bag of Words, in three different vector sizes (100, 200, 300), with the help of 6 Bidirectional Gated Recurrent Units and 2 Convolution layers (MBi-GRUMCONV). In the experiments conducted with the proposed model, the dataset was split into 80%-20% and 70%-30% training-test sets, and 10% of the training splits were used for validation purposes. Accuracy and F1 score criteria were used to evaluate the classification performance. The 95.34% accuracy of the proposed model has outperformed the studies in the literature. As a result of the experiments, it was found that Skip Gram has a better contribution to classification success.
A mobile app for algorithms learning in engineering education: Drag and drop approach
(Wiley, 2021) Kayaalp, Fatih; Dinc, Fatih
Algorithms, taken in the first year of university, is one of the core courses of the computer science department. Traditional learning systems have some restrictions as they are teacher, class, and time-based. However, distance education is one of the popular concepts of today since it has the flexibility of time and choice of location. Mobile learning, which is a subbranch of distance education provides users opportunities to learn through educational mobile applications. In the present study, an application to run on smart devices running the IOS operating system for engineering students to be used in supporting the basic algorithm education is described. The student can both get theoretical knowledge about concepts of algorithms, and test himself/herself with questions in the question bank. There are three types of questions, such as multiple-choice, true/false, and flowchart designed by the dragging-and-dropping method. The main contribution of this application is that the users can design a flowchart from beginning to end on screen by locating any flowchart components provided on the palette. They also have the opportunity to check their own solutions according to the inserted questions and answers in the database. The application was tested on a group of computer engineering undergraduate students from Mus Alparslan University. In that case, the application is evaluated by both an exam and user surveys. The results of the exam show that the presented application had a positive effect on students' success. Besides, the outcomes of the surveys have also given positive feedback.
A Novel and Robust LSTM Model for Customer Churn Analysis Using Deep, Machine Learning, and Ensemble Learning: A Telecommunications Case
(Istanbul Univ, 2025) Basarslan, Muhammet Sinan; Unal, Aslihan; Kayaalp, Fatih
Customer churn is an important issue in increasing both the long- and short-term revenues. If companies identify customers' churn behavior, they can prevent churn, ensure customer loyalty, and, in turn, gain better financial returns. The telecommunications sector is a customer-oriented sector that requires customer retention to survive in the market. In this sector, customer churn is observed at a high level. In recent years, artificial intelligence-based customer churn analysis has been widely used to predict customer churn behavior. In this study, a customer churn analysis was conducted using publicly shared Telco telecommunications data. Predictive models were constructed using machine learning (LR, KNN, SVM, DT, RF, ANN), ensemble learning (XGBoost, Majority Voting), and deep learning (LSTM) methods. In addition, a 3-layered LSTM model was proposed. Accuracy (Acc), F1-score (F1), Precision (Prec), and Recall (Rec) rates were used to evaluate the models. As a result, the novel3-layered LSTM model achieved 91.90% Acc, 91.49% Prec, 92.31% Rec, and 91.90% F1 values. The proposed model is competitive with the existing models.
A Novel Deep Learning-Based Hybrid Method for the Determination of Productivity of Agricultural Products: Apple Case Study
(Ieee-Inst Electrical Electronics Engineers Inc, 2023) Bal, Fatih; Kayaalp, Fatih
The production of agricultural products and the high yield in these products are of critical importance for the continuation of human life. In recent years, machine learning and deep learning technologies have been widely used in determining agricultural productivity. The purpose of this study was to estimate the yield of apple fruit by using a novel deep learning-based hybrid method. First, by using images belonging to the golden and royal gala apple varieties, a classification was made with the help of a convolutional neural network (CNN) that was designed for the study. Then, using classical machine learning algorithms and bagging and boosting algorithms, a hybrid application was performed by classifying the images whose feature extractions were done with the designed CNN. The results of the study, presented on 4 separate datasets (Datasets A, B, C, and D), were evaluated based on accuracy, precision, recall, F-measure, and Cohen kappa scores. Considering the accuracy results for Datasets B, C, and D, it was determined that the hybrid model that gave the best result was the CNN-SVM model. For Dataset A, the CNN-SVM and CNN-Gradient Boosting hybrid models gave the best and same accuracy. Dataset C was determined as the most appropriate dataset in terms of the more balanced distribution of train, test, and validation size in the datasets, the results of the proposed hybrid CNN model, and the evaluation of the results of the model. For Dataset C, it was found that the accuracy of the hybrid model was 99.70%. Precision, recall, f-measure, and Cohen kappa scores were 99%. The results of the study revealed that the hybrid models showed effective results in determining the productivity of apple fruit through images belonging to the golden and royal gala varieties.
Open Source Data Mining Programs: A Case Study on R
(2018) Kayaalp, Fatih; Başarslan, Muhammet Sinan
The processes on the way from raw data to meaningful information is called data mining. The data is processed by applying various methods of data mining in order to extract hidden information among raw data. The processed raw data becomes usable in the next steps of data mining. There are many open source and commercial applications to be used in data mining and data processing. In this study, information about data mining programs are given, and a case study on the R program. The R program has been chosen because it has a large preference rate among the users as shown by various graphs.
Performance Analysis Of Fuzzy Rough Set-Based And Correlation-Based Attribute Selection Methods On Detection Of Chronic Kidney Disease With Various Classifiers
(Ieee, 2019) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Technological developments generally have positive effects on our daily lives especially on health domain. Diagnosing diseases through new machines or methods are easier than compared to the past. Benchmarking the effect of attribute selection methods on the performance of classification algorithms in a study to diganose the chronic kidney disease (CKD) by using classification algorithms are aimed. Data set on CKD taken from the UCI machine learning repository has been used for the experiments. After a variety of pre-processing, normalization and attribute selection processes, classifier models are designed. In order to determine the attributes that have gerater contribution on the classification results, the Correlation Based attribute selection (CBAS) method and Fuzzy Rough Set Based attribute selection (FRSBAS) method were used. Two data sets obtained by each attribute selection method and the raw data are classified by 4 classifiers including k-Nearest Neighbor, Navie Bayes, Random Forest and Logistic Regression. The test and training data are separated by 5-fold cross validation. The accuracy, precision, sensitivity, ROC curve and F-measure parameters obtained from confusion matrix are used to compare and evaluate the results of the models. As a result of the study, it is seen that the application of FRSBAS method on CKD data set performs better in all classification algorithms.
Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data
(2019) Naır, Melih; Kayaalp, Fatih
By the recent developments about the information technologies, companies can store their data faster and easierwith lower costs. All transactions (sales, current card, invoicing, etc.) performed in companies during the daycombine at the end of the day to form big datasets. It is possible to extract valuable information through thesedatasets with data mining. And this has become more important for companies in terms of today's conditions wherethe competition in the market is high. In this study, a dataset of a company selling car maintenance and repairproducts in Turkey is used. Association Rules are applied on this dataset for determining the items which arebought together by the customers. These rules, which are calculated specifically for the company, can be used toredefine the sales and marketing strategies, to revise the storage areas efficiently, and to create sales campaignssuitable for the customers and regions. These algorithms are also called Frequent Itemset Mining Algorithms. Themost recent 11 algorithms from these are applied to this dataset in order to compare the performances accordingto metrics like memory usage and execution times against varying support values and varying record numbers byusing SPMF platform. Three different datasets are created by using the whole dataset like 6-months, 12-monthsand 22-months. According to the experiments, it can be said that executon times generally increases inversely withthe support values as nearly all algorithms have higher execution time values for the lowest support value of 0.1.dEclat_bitset algorithm has the most efficient performance for 6-months and 12-months dataset. But Eclatalgorithm can be said to be the most efficient algorithm for 0.7 and 0.3 support values; on the other handdEclat_bitset is the most efficient algorithm for 0.3 and 0.1 support values on 22-months dataset.
PERFORMANCE COMPARISON OF SMOTE-BASED MACHINE LEARNING MODELS ON UNBALANCED DATASETS: A STUDY ON DATE AND PISTACHIO FRUITS
(2025) Bal, Fatih; Kayaalp, Fatih
Creating balanced datasets is a significant challenge that substantially affects the performance of machine learning models in the classification of agricultural products. In this research, we tried to overcome this challenge by using an unbalanced dataset containing information on 7 date palm (Phoenix dactylifera L.) and 2 pistachio (Pistacia vera L.) cultivars. The aim of the study is to compare the classification performance of machine learning models on an unbalanced dataset and a balanced dataset using the SMOTE technique. Initially, classification was performed on the unbalanced dataset using machine learning approaches. Among the machine learning models applied on the unbalanced dataset, the Linear-SVM model showed the highest accuracy rate with an accuracy rate of 92,62%. In the data set extended by applying the SMOTE technique, the RBF-SVM model again showed the highest accuracy rate with 95,55% accuracy rate. In summary, our study highlights the difficulties in machine learning-based agricultural crop classification due to data unbalances. Utilizing the SMOTE technique for oversampling was effective in overcoming this obstacle and improving classification accuracy.
Review of Customer Churn Analysis Studies in Telecommunications Industry
(2017) Kayaalp, Fatih
Churn Analysis is one of the world wide used analysis on Subscription Oriented Industries to analyze customer behaviors to predictthe customers which are about to leave the service aggrement from a company. It is based on Data Mining methods and algorithmsand become so important for companies in today’s commercial conditions as gaining a new customer’s cost is more than retaining theexisting ones.The paper reviews the releveant studies on Customer Churn Analysis on Telecommunication Industry in literature to present a generalinformation to readers about the frequently used data mining methods used, results and performance of the methods and shedding alight to further studies. To keep the review up to date, studies published in last five years and mainly last two years have been included.
Review of machine learning and deep learning models in agriculture
(2021) Bal, Fatih; Kayaalp, Fatih
Machine learning (ML) refers to the processes that enable computers to think based on variouslearning methods. It can be also called domain which is a subset of Artificial Intelligence (AI).Deep learning (DL) has been a promising, new and modern technique for data analysis in recentyears. It can be shown as the improved version of Artificial Neural Networks (ANN) which is oneof the popular AI methods of today. The population of the world is increasing day by day and theimportance of agriculture is also increasing in parallel. Because of this, many researchers havefocused on this issue and have tried to apply machine learning and deep learning methods inagriculture under the name of smart farm technologies both to increase agricultural production andto solve some challenges of agriculture. In this study, it is aimed to give detailed information aboutthese up-to-date studies. 77 articles based on machine learning and deep learning algorithms in theagriculture field and published in IEEE Xplore, ScienceDirect, Web of Science and Scopuspublication databases between 2016 and 2020 years were reviewed. The articles were classifiedunder five categories as plant recognition, disease detection, weed and pest detection, soilmapping-drought index, and yield forecast. They were examined in detail in terms of machinelearning/deep learning architectures, data sets, performance metrics (Accuracy, Precision, Recall,F-Score, R2, MAPE, RMSE, MAE), and the obtained experimental results. Based on the examinedarticles, the most popular methods, used data sets/types, chosen performance criteria, andperformance results among the existing studies are presented. It is seen that the number of AIbased applications related to agriculture is increasing compared to the past and the sustainabilityin productivity is so promising.
Sentiment Analysis on Social Media Reviews Datasets with Deep Learning Approach
(2021) Başarslan, Muhammet Sinan; Kayaalp, Fatih
Thanks to social media, people are now able to leave guiding comments quickly about their favorite restaurants, movies, etc. This has paved the way for the field of sentiment analysis, which brings together various disciplines. In this study, Yelp restaurant reviews and IMDB movie reviews dataset were used together with the data collected from Twitter. Word2Vec (W2V), Global Vector (GloVe) and Bidirectional Encoder Representation (BERT) word embedding methods, Term Frequency-Reverse Document Frequency (TF-IDF), and the Bag-of-Words (BOW) were used on these datasets. Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), Support Vector Machine (SVM), and Naive Bayes (NB) were used in the sentiment analysis models. Accuracy, F-measure (F), Sensitivity (Sens), Precision (Pre), and Receiver Operating Characteristics (ROC) were used in the evaluation of the model performance. The Accuracy rates of the models created by the Machine Learning (ML) and Deep Learning (DL) methods using the IMDB dataset were in the range of 81%-90% and 84%-94%, respectively. These rates were in the range of 80%-86% and 81%-89% for the Yelp dataset, and in the range of 75%-79% and 85%-98% for the Twitter dataset. The models that incorporated the BERT word embedding method have the best performance, compared to the other models with ML and DL. Therefore, BERT method is recommended for this type of analysis in future studies.
Sentiment analysis using a deep ensemble learning model
(Springer, 2023) Basarslan, Muhammet Sinan; Kayaalp, Fatih
The coronavirus pandemic has kept people away from social life and this has led to an increase in the use of social media over the past two years. Thanks to social media, people can now instantly share their thoughts on various topics such as their favourite movies, restaurants, hotels, etc. This has created a huge amount of data and many researchers from different sciences have focused on analysing this data. Natural Language Processing (NLP) is one of these areas of computer science that uses artificial technologies. Sentiment analysis is also one of the tasks of NLP, which is based on extracting emotions from huge post data. In this study, sentiment analysis was performed on two datasets of tweets about coronavirus and TripAdvisor hotel reviews. A frequency-based word representation method (Term Frequency-Inverse Document Frequency (TF-IDF)) and a prediction-based Word2Vec word embedding method were used to vectorise the datasets. Sentiment analysis models were then built using single machine learning methods (Decision Trees-DT, K-Nearest Neighbour-KNN, Naive Bayes-NB and Support Vector Machine-SVM), single deep learning methods (Long Short Term Memory-LSTM, Recurrent Neural Network-RNN) and heterogeneous ensemble learning methods (Stacking and Majority Voting) based on these single machine learning and deep learning methods. Accuracy was used as a performance measure. The heterogeneous model with stacking (LSTM-RNN) has outperformed the other models with accuracy values of 0.864 on the coronavirus dataset and 0.898 on the Trip Advisor dataset and they have been evaluated as promising results when compared to the literature. It has been observed that the use of single methods as an ensemble gives better results, which is consistent with the literature, which is a step forward in the detection of sentiments through posts. Investigating the performance of heterogeneous ensemble learning models based on different algorithms in sentiment analysis tasks is planned as future work.
Sentiment Analysis with Machine Learning Methods on Social Media
(Ediciones Univ Salamanca, 2020) Basarslan, Muhammet Sinan; Kayaalp, Fatih
Social media has become an important part of our everyday life due to the widespread use of the Internet. Of the social media services, Twitter is among the most used ones around the world. People share their opinions by writing tweets about numerous subjects, such as politics, sports, economy, etc. Millions of tweets per day create a huge dataset, which drew attention of the data scientists to focus on these data for sentiment analysis. The sentiment analysis focuses to identify the social media posts of users about a specific topic and categorize them as positive, negative or neutral. Thus, the study aims to investigate the effect of types of text representation on the performance of sentiment analysis. In this study, two datasets were used in the experiments. The first one is the user reviews about movies from the IMDB, which has been labeled by Kotzias, and the second one is the Twitter tweets, including the tweets of users about health topic in English in 2019, collected using the Twitter API. The Python programming language was used in the study both for implementing the classification models using the Naive Bayes (NB), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) algorithms, and for categorizing the sentiments as positive, negative and neutral. The feature extraction from the dataset was performed using Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec (W2V) modeling techniques. The success percentages of the classification algorithms were compared at the end. According to the experimental results, Artificial Neural Network had the best accuracy performance in both datasets compared to the others.
Shiny ile R Uygulaması Geliştirme ve Müşteri Kayıp Analizine İlişkin Web Destekli Bir Örnek Çalışma
(2017) Kayaalp, Fatih; Başarslan, Muhammet Sinan
Veri madenciliğindeki problemlerin çözümünde çeşitli platformlar kullanılır. R programlama dili de bu platformlar içinde yaygın olarak kullanılanlardan biridir. R programlama dili açık kaynak kodlu olması ve sunduğu topluluk desteği sayesinde araştırmacıların ilgi odağındadır. Shiny, R kodlarının bir web arayüzü aracılığıyla paylaşılmasını sağlayan bir R paketidir. Müşteri Kayıp Analizi, hizmet almakta olan müşterilerden hangilerinin hizmet alımını durdurmayı düşündüğünü tahmin etmek amaçlı olarak kullanılan bir analizdir. Özellikle abonelik temelli sektörlerden olan haberleşme, ödemeli TV gibi sektörlerde müşteri analizlerinde yaygın olarak kullanılmaktadır. Bu çalışmada, telekomünikasyon sektöründeki müşteri veri seti üzerinde C4.5 karar ağacı algoritması kullanılarak R paketlerinden biri olan Shiny vasıtasıyla web destekli olarak gerçekleştirilmiş olan müşteri kayıp analizi sunulmaktadır. Sunulan çalışma, uygulamanın gerçek hayattan alınmış bir veri seti üzerinde yapılmış olması, bu analizin yerel bilgisayarda yapılması yerine erişim izni olan kullanıcılar tarafından web üzerinden erişilerek uzaktan yapılabilmesine imkan sağlaması ve telekomünikasyon alanında müşteri kayıp analizi hakkında daha önceden R ile web tabanlı olarak yapılmamış olması özelliklerinden dolayı özgünlük taşımaktadır
TSCBAS: A Novel Correlation Based Attribute Selection Method and Application on Telecommunications Churn Analysis
(Ieee, 2018) Kayaalp, Fatih; Başarslan, Muhammet Sinan; Polat, Kemal
Attribute selection has a significant effect on the performance of the machine learning studies by selecting the attributes having significant effect on result, reducing the number of attributes, and reducing the calculation cost. In this study, a new attribute selection method which is a combination of the Rcorrelation coefficient-based attribute selection (RCBAS) and the rho-correlation coefficient-based attribute selection (rho CBAS) called the Two-Stage Correlation-Based Attribute Selection (TSCBAS) is proposed to select significant attributes. The proposed attribute selection method has been applied to customer churn prediction on a telecommunications dataset for performance evaluation. The dataset used in the study includes real customer call records details for the years 2013 and 2014 obtained from a major telecommunications company in Turkey. Apart from the proposed attribute selection method, four different methods named Rcorrelation coefficient-based attribute selection, rho-correlation coefficient-based attribute selection, ReliefF, and Gain Ratio have been used for creating five datasets. After that, four classifier algorithms including Random Forest, C4.5 Decision Tree, Naive Bayes and AdaBoost. M1 have been applied. The obtained results have been compared according to the performance metrics comprising Accuracy (ACC), Sensitivity (TPR), Specificity (SPC), F-measure (F), AUC (area under the ROC curve), and run-time. The results of the comparisons show that the proposed attribute selection algorithm outperforms the state of the art methods on customer churn prediction.

Yazar "Kayaalp, Fatih" seçeneğine göre listele

Sayfa Başına Sonuç

Sıralama seçenekleri