Sentiment analysis of coronavirus data with ensemble and machine learning methods

Başarslan, M.S.; Kayaalp, F.

Sentiment analysis of coronavirus data with ensemble and machine learning methods

dc.authorscopusid	57203003458	en_US
dc.authorscopusid	56495320500	en_US
dc.contributor.author	Başarslan, M.S.
dc.contributor.author	Kayaalp, F.
dc.date.accessioned	2024-08-23T16:07:28Z
dc.date.available	2024-08-23T16:07:28Z
dc.date.issued	2024	en_US
dc.department	Düzce Üniversitesi	en_US
dc.description.abstract	The coronavirus pandemic has distanced people from social life and increased the use of social media. People's emotions can be determined with text data collected from social media applications. This is used in many fields, especially in commerce. This study aims to predict people's sentiments about the pandemic by applying sentiment analysis to Twitter tweets about the pandemic using single machine learning classifiers (Decision Tree-DT, K-Nearest Neighbor-KNN, Logistic Regression-LR, Naïve Bayes-NB, Random Forest-RF) and ensemble learning methods (Majority Voting (MV), Probabilistic Voting (PV), and Stacking (STCK)). After vectorizing the tweets using two predictive methods, Word2Vec (W2V) and Doc2Vec, and two traditional word representation methods, Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BOW), classification models built using single machine learning classifiers were compared to models built using ensemble learning methods (MV, PV and STCK) by heterogeneously combining single machine classifier algorithms. Accuracy (ACC), F-measure (F), precision (P), and recall (R) were used as performance measures, with training/test separation rates of 70%-30% and 80%-20%, respectively. Among these models, the ACC of ensemble learning models ranged from 89% to 73%, while the ACC of single classifier models ranged from 60% to 80%. Among the ensemble learning methods, STCK with Doc2Vec text representation/embedding method gave the best ACC result of 89%. According to the experimental results, ensemble models built with heterogeneous machine learning classifier algorithms gave better results than single machine learning classifier algorithms. © Author(s) 2024.	en_US
dc.identifier.doi	10.31127/tuje.1352481
dc.identifier.endpage	185	en_US
dc.identifier.issn	2587-1366
dc.identifier.issue	2	en_US
dc.identifier.scopus	2-s2.0-85192945148	en_US
dc.identifier.scopusquality	N/A	en_US
dc.identifier.startpage	175	en_US
dc.identifier.trdizinid	1232985	en_US
dc.identifier.uri	https://doi.org/10.31127/tuje.1352481
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1232985
dc.identifier.uri	https://hdl.handle.net/20.500.12684/14662
dc.identifier.volume	8	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.indekslendigikaynak	TR-Dizin	en_US
dc.language.iso	en	en_US
dc.publisher	Murat Yakar	en_US
dc.relation.ispartof	Turkish Journal of Engineering	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Ensemble learning	en_US
dc.subject	Machine learning	en_US
dc.subject	Sentiment analysis	en_US
dc.subject	Text representation	en_US
dc.subject	Word embedding	en_US
dc.title	Sentiment analysis of coronavirus data with ensemble and machine learning methods	en_US
dc.type	Article	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 14662.pdf
Boyut:: 3.97 MB
Biçim:: Adobe Portable Document Format

İndir

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu
TR-Dizin İndeksli Yayınlar Koleksiyonu