Word2vec temsillerini kullanarak Türkçede soru sınıflandırmasında derin öğrenme analizi

Yılmaz, Şeyhmus

Word2vec temsillerini kullanarak Türkçede soru sınıflandırmasında derin öğrenme analizi

dc.contributor.advisor	Toklu, Sinan
dc.contributor.author	Yılmaz, Şeyhmus
dc.date.accessioned	2021-02-25T15:04:40Z
dc.date.available	2021-02-25T15:04:40Z
dc.date.issued	2020
dc.department	DÜ, Fen Bilimleri Enstitüsü, Elektrik-Elektronik ve Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.description	YÖK Tez No: 639475	en_US
dc.description.abstract	Soru sınıflandırması, otomatik soru cevaplama uygulamaları için temel bir çalışma alanıdır. Dilsel özellikler, doğru bir soru sınıflandırıcısı geliştirmede önemli bir rol oynamaktadır. Son zamanlarda, derin öğrenme sistemleri duygu analizi, belge sınıflandırma, spam filtreleme, doküman özetleme ve web madenciliği gibi çeşitli metin madenciliği problemlerinde dikkate değer bir başarı sağlamışlardır. Bu çalışmada, aglutinatif bir dil olan Türkçe'de bazı derin öğrenme mimarilerini kullanarak soru sınıflandırması alanı üzerinde çalışılmıştır. Hint dışı Avrupa dili olarak, Türkçe gibi diller doğal dil işlemeyi zorlaştıran bazı benzersiz özelliklere sahiptir. Bu çalışmada, derin öğrenme mimarilerini eğitmek ve test etmek için İngilizce'den Türkçe'ye çevrilmiş kullanıcı soruları kullanılmıştır. Bu tez çalışmasında iki büyük derin öğrenme modeli kullanılmıştır. Bunlar: Konvolüsyonel Sinir Ağları (CNN), Uzun Kısa Süreli Bellek (LSTM) ve bunlara ek olarak da ayrıca CNN-LSTM, CNN-SVM yapılarıdır. Bu çalışmada ilk önce, büyük bir korpus kullanılarak word2vec yöntemlerinden olan Skip-gram ve CBOW ile farklı özellik ve boyutta kelime vektörleri üretilmiştir. Çünkü bu çalışmadaki bir diğer amaç ise derin öğrenme mimarileri üzerinde önceden eğitilmiş farklı word2vec kelime gömmelerini kullanmanın sonuçlara olan etkisini incelemektir. Çalışmanın sonuçları, farklı derin öğrenme modellerinde farklı word2vec modellerinin kullanımının doğruluk oranı üzerinde önemli bir etkiye sahip olduğunu göstermektedir. Ayrıca, bu çalışmaya başlarken çalışmamıza uygun herhangi bir etiketlenmiş Türkçe soru veri seti bulunmadığından, bu çalışmadaki bir diğer katkı da UIUC İngilizce soru veri kümesinden çevrilen yeni Türkçe Soru veri tabanının tanıtılmasıdır. Buna ek olarak, derin öğrenme mimarilerinin başarı oranları doğruluk ve 10-Katlamalı Çapraz Doğrulama açısından karşılaştırılmıştır. Bu tez çalışmasında bahsedilen teknikleri kullanarak soru veri setinde %94 doğruluk oranına ulaşılmıştır.	en_US
dc.description.abstract	Question classification is a primary essential study for automatic question answering implementations. Linguistic features take a significant role to develop an accurate question classifier. Recently, deep learning systems have achieved remarkable success in various text-mining problems such as sentiment analysis, document classification, spam filtering, document summarization, and web mining. In this study, we explain our study on investigating some deep learning architectures for a question classification task in a highly inflectional language Turkish that is an agglutinative language where word structure is produced by adding suffixes (morphemes) to root word. As a Non-Indo European Language, languages like Turkish have some unique features, which make it challenging for natural language processing. For instance, Turkish has no grammatical gender and noun classes. In this study, user questions in Turkish are used to train and test the deep learning architectures. In addition to this, the details of the deep learning architectures are compared in terms of test and 10-cross fold validation accuracy. We use two major deep learning models in our paper: Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), We also implemented the combination of CNN-LSTM, CNN-SVM structures and a number of various those architectures by changing vector sizes and the embedding types. As well as this, we have built word embeddings using the word2vec method with a CBOW and Skip Gram models with different vector sizes on a large corpus composed of user questions. Our another investigation is the effect of using different word2vec pre-trained word embeddings on these deep learning architectures. Experiment results show that the use of different word2vec models has a significant impact on the accuracy rate on different deep learning models. Additionally, there is no Turkish question dataset labeled and so another contribution in this study is that we introduce new Turkish Question Dataset which is translated from UIUC English question dataset. By using these techniques, we have reached an accuracy of %94 on the question dataset.	en_US
dc.identifier.endpage	92	en_US
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=_F5QEpayDXGqGZlp9XiFtM9oFVhjk2aHJ5sQHgS-HzExbTSExta15gLEfEpokjHV
dc.identifier.uri	https://hdl.handle.net/20.500.12684/7257
dc.institutionauthor	Yılmaz, Şeyhmus	en_US
dc.language.iso	tr	en_US
dc.publisher	Düzce Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	en_US
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Word2vec temsillerini kullanarak Türkçede soru sınıflandırmasında derin öğrenme analizi	en_US
dc.title.alternative	A deep learning analysis on Turkish question classification task using word2vec representations	en_US
dc.type	Doctoral Thesis	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 639475.pdf
Boyut:: 3.68 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin / Full Text

İndir

Koleksiyon

Fen Bilimleri Enstitüsü Tez Koleksiyonu