Towards the Importance of the Type of Deep Neural Network and Employment of Pre-trained Word Vectors for Toxicity Detection: An Experimental Study

dc.authoridKabakuş, Abdullah Talha/0000-0003-2181-4292
dc.authorwosidKabakuş, Abdullah Talha/J-8361-2019
dc.contributor.authorKabakuş, Abdullah Talha
dc.date.accessioned2023-07-26T11:50:05Z
dc.date.available2023-07-26T11:50:05Z
dc.date.issued2021
dc.departmentDÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.description.abstractAs a natural consequence of offering many advantages to their users, social media platforms have become a part of their daily lives. Recent studies emphasize the necessity of an automated way of detecting offensive posts in social media since these 'toxic' posts have become pervasive. To this end, a novel toxic post detection approach based on Deep Neural Networks was proposed within this study. Given that several word embedding methods exist, we shed light on which word embedding method produces better results when employed with the five most common types of deep neural networks, namely, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), GRU (Gated Recurrent Unit), Bidirectional Long ShortTerm Memory (BiLSTM), and a combination of CNN and BiLSTM. To this end, the word vectors for the given comments were obtained through four different methods, namely, (i) Word2vec, (ii) fastT ext, (iii) GloVe, and (iv) the Embedding layer of deep neural networks. Eventually, a total of twenty benchmark models were proposed and both trained and evaluated on a gold standard dataset which consists of 16K tweets. According to the experimental result, the best F1 - score, 84.844%, was obtained on the proposed CNN model without employing pre-trained word vectors which outperformed the state-of-the-art works and implies the effective embedding ability of CNNs. Other key findings obtained through the conducted experiments are that the models, that constructed word embeddings through the Embedding layers, obtained higher F1 - scores and converged much faster than the models that utilized pre-trained word vectors.en_US
dc.identifier.doi10.13052/jwe1540-9589.2082
dc.identifier.endpage2268en_US
dc.identifier.issn1540-9589
dc.identifier.issn1544-5976
dc.identifier.issue8en_US
dc.identifier.scopus2-s2.0-85124826025en_US
dc.identifier.scopusqualityQ4en_US
dc.identifier.startpage2243en_US
dc.identifier.urihttps://doi.org/10.13052/jwe1540-9589.2082
dc.identifier.urihttps://hdl.handle.net/20.500.12684/12227
dc.identifier.volume20en_US
dc.identifier.wosWOS:000722006700002en_US
dc.identifier.wosqualityQ4en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.institutionauthorKabakuş, Abdullah Talha
dc.language.isoenen_US
dc.publisherRiver Publishersen_US
dc.relation.ispartofJournal of Web Engineeringen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.snmz$2023V1Guncelleme$en_US
dc.subjectWord Embedding; Word Vector; Deep Neural Network; Convolutional Neural Network; Recurrent Neural Network; Toxic Comment Detectionen_US
dc.titleTowards the Importance of the Type of Deep Neural Network and Employment of Pre-trained Word Vectors for Toxicity Detection: An Experimental Studyen_US
dc.typeArticleen_US

Dosyalar