A deep learning analysis on question classification task using Word2vec representations

dc.contributor.authorYilmaz, Seyhmus
dc.contributor.authorToklu, Sinan
dc.date.accessioned2021-12-01T18:47:11Z
dc.date.available2021-12-01T18:47:11Z
dc.date.issued2020
dc.department[Belirlenecek]en_US
dc.description.abstractQuestion classification is a primary essential study for automatic question answering implementations. Linguistic features take a significant role to develop an accurate question classifier. Recently, deep learning systems have achieved remarkable success in various text-mining problems such as sentiment analysis, document classification, spam filtering, document summarization, and web mining. In this study, we explain our study on investigating some deep learning architectures for a question classification task in a highly inflectional language Turkish that is an agglutinative language where word structure is produced by adding suffixes (morphemes) to root word. As a non-Indo-European language, languages like Turkish have some unique features, which make it challenging for natural language processing. For instance, Turkish has no grammatical gender and noun classes. In this study, user questions in Turkish are used to train and test the deep learning architectures. In addition to this, the details of the deep learning architectures are compared in terms of test and 10-cross fold validation accuracy. We use two major deep learning models in our paper: long short-term memory (LSTM), Convolutional Neural Networks (CNN), and we also implemented the combination of CNN-LSTM, CNN-SVM structures and a number of various those architectures by changing vector sizes and the embedding types. As well as this, we have built word embeddings using the Word2vec method with a CBOW and skip gram models with different vector sizes on a large corpus composed of user questions. Our another investigation is the effect of using different Word2vec pre-trained word embeddings on these deep learning architectures. Experiment results show that the use of different Word2vec models has a significant impact on the accuracy rate on different deep learning models. Additionally, there is no Turkish question dataset labeled and so another contribution in this study is that we introduce new Turkish question dataset which is translated from UIUC English question dataset. By using these techniques, we have reached an accuracy of 94% on the question dataset.en_US
dc.identifier.doi10.1007/s00521-020-04725-w
dc.identifier.endpage2928en_US
dc.identifier.issn0941-0643
dc.identifier.issn1433-3058
dc.identifier.issue7en_US
dc.identifier.scopus2-s2.0-85078256408en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage2909en_US
dc.identifier.urihttps://doi.org/10.1007/s00521-020-04725-w
dc.identifier.urihttps://hdl.handle.net/20.500.12684/10176
dc.identifier.volume32en_US
dc.identifier.wosWOS:000522553100069en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringer London Ltden_US
dc.relation.ispartofNeural Computing & Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDeep learningen_US
dc.subjectQuestion classificationen_US
dc.subjectSVMen_US
dc.subjectWord embeddingen_US
dc.subjectWord2vecen_US
dc.subjectSentiment Classificationen_US
dc.subjectNetsen_US
dc.titleA deep learning analysis on question classification task using Word2vec representationsen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
10176.pdf
Boyut:
1.5 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text