Geliştirilen yeni filtrelerin ve temel frekans tespit yönteminin derin öğrenme ile konuşma duygu analizinde uygulanması

Parlak, Cevahir

Geliştirilen yeni filtrelerin ve temel frekans tespit yönteminin derin öğrenme ile konuşma duygu analizinde uygulanması

dc.contributor.advisor	Altun, Yusuf
dc.contributor.author	Parlak, Cevahir
dc.date.accessioned	2023-04-04T16:04:51Z
dc.date.available	2023-04-04T16:04:51Z
dc.date.issued	2022
dc.department	DÜ, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.description.abstract	Bu tez çalışmasında konuşma duygu tanıma uygulamaları için yeni filtre bankaları ve insan sesi temel frekans tespiti için yeni bir metot önerilmektedir. Yeni filtre bankalarının konuşma duygu tanıma uygulamalarında büyük gelişmelerin önünü açması beklenmektedir. Günümüze kadar pek çok farklı filtre bankası konuşma tanıma uygulamaları için önerilmiştir. Ancak bu modeller genellikle çok fazla parametre içermekte veya karmaşık bazı matematiksel işlemlere gereksinim duymaktadırlar. MFCC (Mel Frequency Cepstral Coefficients) katsayıları Mel filtre bankalarından türetilirken DCT (Discrete Cosine Transform) uygulanmaktadır. Ayrıca MFCC katsayılarını akustik olarak yorumlamak hemen hemen imkansızdır. Mel filtre bankaları daha kolay yorumlanabilmesine rağmen çok fazla sayıda parametre içermektedir. Önerilen EFB (Emotional Filter Banks) filtre bankaları daha kolay yorumlanabildiği gibi hesaplama yönünden de daha hızlıdırlar. Bu çalışmada bu filtre bankalarını SVM-SMO (Support Vector Machine-Sequential Minimal Optimization) ve Derin Yapay Sinir Ağı modelleri ile uygulayıp MFCC ve Mel filtre bankaları ile EmoSTAR, EmoDB (Berlin Emotional Database), IEMOCAP (Interactive Emotional Dyadic Motion Capture Database) ve MELD (Multimodal EmotionLines Dataset) verisetleri üzerinde uygulayıp karşılaştıracağız. Özellik seçme ve veri türetme uygulamaları da ayrıca incelenecektir. Temel frekans tespiti için HDM (Harmonic Differences Method) metodu önerilecek olup genişbant ve darbant (telefon) konuşma için araştırılacaktır. HDM harmonikler arasındaki farkı temel alarak çalışmaktadır. Temel frekans için Hillenbrand ve Texas Sesli verisetleri ile TIMIT (Texas Instruments Massachusetts Institute of Technology) verisetinin sesli kısmının tamamı kullanılacaktır. HDM algoritması otokorelasyon, kepstrum, YIN, YAAPT (Yet Another Algorithm for Pitch Tracking), CREPE (Convolutional Representation for Pitch Estimation) ve FCN (Fully Convolutional Network) metotları ile karşılaştırılacaktır. Sonuçlar harmonikler arasındaki farkların temel frekans için iyi bir seçim olduğunu ve HDM metodunun diğerlerine göre çoğunlukla daha başarılı sonuçlar üretebildiğini göstermektedir.	en_US
dc.description.abstract	In this manuscript, a novel filter bank design, named EFB and a pitch determination algorithm, HDM, are proposed. The proposed filter banks are aimed to replace current state-of-the-art MFCC and Mel filter banks. We hope that EFB filters will have great impact over the speech emotion recognition applications. Today, most of the speech processing applications use Mel filters or its transformed and reduced version MFCC. There are various other filter banks proposed to imitate the human ear structure. However, these models have too many redundant frequency regions. MFCC contains fewer coefficients but computation of DCT is a setback of speed. Another disadvantage of these filters is the difficulty to interpret the MFCC values. It is very hard to gain an insight by inspecting the Mel filters or MFCC. The EFB filter banks are not only fast and easy to compute compared to the Mel and MFCC, but it also provides more insights about the underlying structure of the speech waveform. In this study, EFB filter bank is implemented on emotional speech datasets (EmoSTAR, EmoDB, IEMOCAP, MELD) with various Deep Learning Architectures and SVM-SMO classifier to compare them with MFCC and Mel filter banks. We also investigate feature selection and data augmentation methods. Prosodic features are used very extensively in speech emotion applications. For this part, we developed a novel fundamental frequency calculation method called HDM which exploits the intervals between the harmonics of vowel speech sounds. We test the HDM against some of the prominent algorithms such as autocorrelation, CREPE, YIN, YAAPT, cepstrum, and FCN on Hillenbrand Vowel dataset, Texas Vowel dataset, and vowel part of TIMIT dataset for narrowband telephony speech as well as wideband speech.	en_US
dc.identifier.endpage	101	en_US
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=kScA8XnrRb0WogX-qPGFkmvGwn_SHzXhsh0aRyC4nJWAiGqXQbXP2SeHhxx3QdGS
dc.identifier.uri	https://hdl.handle.net/20.500.12684/11056
dc.identifier.yoktezid	730882	en_US
dc.institutionauthor	Parlak, Cevahir
dc.language.iso	tr	en_US
dc.publisher	Düzce Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	en_US
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Geliştirilen yeni filtrelerin ve temel frekans tespit yönteminin derin öğrenme ile konuşma duygu analizinde uygulanması	en_US
dc.title.alternative	Application of novel filter banks and fundamental frequency detection method in speech emotion recognition with deep learning	en_US
dc.type	Doctoral Thesis	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 730882.pdf
Boyut:: 3.87 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin / Full Text

İndir

Koleksiyon

Lisansüstü Eğitim Enstitüsü Tez Koleksiyonu