Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Parlak, Cevahir; Diri, Banu; Altun, Yusuf

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

dc.authorid	PARLAK (PhD), Cevahir/0000-0002-5500-7379	en_US
dc.authorid	ALTUN, Yusuf/0000-0002-2099-0959	en_US
dc.authorid	Diri, Banu/0000-0002-6652-4339	en_US
dc.authorscopusid	55807221400	en_US
dc.authorscopusid	22978771800	en_US
dc.authorscopusid	25031391400	en_US
dc.authorwosid	Diri, Banu/AAA-1020-2021	en_US
dc.authorwosid	PARLAK (PhD), Cevahir/ABA-4914-2021	en_US
dc.authorwosid	ALTUN, Yusuf/AAA-9929-2020	en_US
dc.contributor.author	Parlak, Cevahir
dc.contributor.author	Diri, Banu
dc.contributor.author	Altun, Yusuf
dc.date.accessioned	2024-08-23T16:07:03Z
dc.date.available	2024-08-23T16:07:03Z
dc.date.issued	2024	en_US
dc.department	Düzce Üniversitesi	en_US
dc.description.abstract	In this study, novel Spectro-Temporal Energy Ratio features based on the formants of vowels, linearly spaced low-frequency, and logarithmically spaced high-frequency parts of the human auditory system are introduced to implement single- and cross-corpus speech emotion recognition experiments. Since the underlying dynamics and characteristics of speech recognition and speech emotion recognition differ too much, designing an emotion-recognition-specific filter bank is mandatory. The proposed features will formulate a novel filter bank strategy to construct 7 trapezoidal filter banks. These novel filter banks differ from Mel and Bark scales in shape and frequency regions and are targeted to generalize the feature space. Cross-corpus experimentation is a step forward in speech emotion recognition, but the researchers are usually chagrined at its results. Our goal is to create a feature set that is robust and resistant to cross-corporal variations using various feature selection algorithms. We will prove this by shrinking the dimension of the feature space from 6984 down to 128 while boosting the accuracy using SVM, RBM, and sVGG (small-VGG) classifiers. Although RBMs are considered no longer fashionable, we will show that they can achieve outstanding jobs when tuned properly. This paper discloses a striking 90.65% accuracy rate harnessing STER features on EmoDB.	en_US
dc.identifier.doi	10.1007/s13369-023-07920-8
dc.identifier.endpage	3223	en_US
dc.identifier.issn	2193-567X
dc.identifier.issn	2191-4281
dc.identifier.issue	3	en_US
dc.identifier.scopus	2-s2.0-85160269669	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.startpage	3209	en_US
dc.identifier.uri	https://doi.org/10.1007/s13369-023-07920-8
dc.identifier.uri	https://hdl.handle.net/20.500.12684/14450
dc.identifier.volume	49	en_US
dc.identifier.wos	WOS:000995842100002	en_US
dc.identifier.wosquality	Q2	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Heidelberg	en_US
dc.relation.ispartof	Arabian Journal For Science and Engineering	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Speech emotion recognition	en_US
dc.subject	Filter banks	en_US
dc.subject	Feature selection	en_US
dc.subject	SVM	en_US
dc.subject	Restricted Boltzmann machines	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Fundamental-Frequency	en_US
dc.subject	Loudness	en_US
dc.subject	Classification	en_US
dc.subject	Extraction	en_US
dc.subject	Networks	en_US
dc.title	Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Dosyalar

Koleksiyon