A deep learning analysis on question classification task using Word2vec representations

Yilmaz, Seyhmus; Toklu, Sinan

A deep learning analysis on question classification task using Word2vec representations

dc.contributor.author	Yilmaz, Seyhmus
dc.contributor.author	Toklu, Sinan
dc.date.accessioned	2021-12-01T18:47:11Z
dc.date.available	2021-12-01T18:47:11Z
dc.date.issued	2020
dc.department	[Belirlenecek]	en_US
dc.description.abstract	Question classification is a primary essential study for automatic question answering implementations. Linguistic features take a significant role to develop an accurate question classifier. Recently, deep learning systems have achieved remarkable success in various text-mining problems such as sentiment analysis, document classification, spam filtering, document summarization, and web mining. In this study, we explain our study on investigating some deep learning architectures for a question classification task in a highly inflectional language Turkish that is an agglutinative language where word structure is produced by adding suffixes (morphemes) to root word. As a non-Indo-European language, languages like Turkish have some unique features, which make it challenging for natural language processing. For instance, Turkish has no grammatical gender and noun classes. In this study, user questions in Turkish are used to train and test the deep learning architectures. In addition to this, the details of the deep learning architectures are compared in terms of test and 10-cross fold validation accuracy. We use two major deep learning models in our paper: long short-term memory (LSTM), Convolutional Neural Networks (CNN), and we also implemented the combination of CNN-LSTM, CNN-SVM structures and a number of various those architectures by changing vector sizes and the embedding types. As well as this, we have built word embeddings using the Word2vec method with a CBOW and skip gram models with different vector sizes on a large corpus composed of user questions. Our another investigation is the effect of using different Word2vec pre-trained word embeddings on these deep learning architectures. Experiment results show that the use of different Word2vec models has a significant impact on the accuracy rate on different deep learning models. Additionally, there is no Turkish question dataset labeled and so another contribution in this study is that we introduce new Turkish question dataset which is translated from UIUC English question dataset. By using these techniques, we have reached an accuracy of 94% on the question dataset.	en_US
dc.identifier.doi	10.1007/s00521-020-04725-w
dc.identifier.endpage	2928	en_US
dc.identifier.issn	0941-0643
dc.identifier.issn	1433-3058
dc.identifier.issue	7	en_US
dc.identifier.scopus	2-s2.0-85078256408	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.startpage	2909	en_US
dc.identifier.uri	https://doi.org/10.1007/s00521-020-04725-w
dc.identifier.uri	https://hdl.handle.net/20.500.12684/10176
dc.identifier.volume	32	en_US
dc.identifier.wos	WOS:000522553100069	en_US
dc.identifier.wosquality	Q1	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Springer London Ltd	en_US
dc.relation.ispartof	Neural Computing & Applications	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Deep learning	en_US
dc.subject	Question classification	en_US
dc.subject	SVM	en_US
dc.subject	Word embedding	en_US
dc.subject	Word2vec	en_US
dc.subject	Sentiment Classification	en_US
dc.subject	Nets	en_US
dc.title	A deep learning analysis on question classification task using Word2vec representations	en_US
dc.type	Article	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 10176.pdf
Boyut:: 1.5 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin / Full Text

İndir

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu