Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

dc.authoridPARLAK (PhD), Cevahir/0000-0002-5500-7379en_US
dc.authoridALTUN, Yusuf/0000-0002-2099-0959en_US
dc.authoridDiri, Banu/0000-0002-6652-4339en_US
dc.authorscopusid55807221400en_US
dc.authorscopusid22978771800en_US
dc.authorscopusid25031391400en_US
dc.authorwosidDiri, Banu/AAA-1020-2021en_US
dc.authorwosidPARLAK (PhD), Cevahir/ABA-4914-2021en_US
dc.authorwosidALTUN, Yusuf/AAA-9929-2020en_US
dc.contributor.authorParlak, Cevahir
dc.contributor.authorDiri, Banu
dc.contributor.authorAltun, Yusuf
dc.date.accessioned2024-08-23T16:07:03Z
dc.date.available2024-08-23T16:07:03Z
dc.date.issued2024en_US
dc.departmentDüzce Üniversitesien_US
dc.description.abstractIn this study, novel Spectro-Temporal Energy Ratio features based on the formants of vowels, linearly spaced low-frequency, and logarithmically spaced high-frequency parts of the human auditory system are introduced to implement single- and cross-corpus speech emotion recognition experiments. Since the underlying dynamics and characteristics of speech recognition and speech emotion recognition differ too much, designing an emotion-recognition-specific filter bank is mandatory. The proposed features will formulate a novel filter bank strategy to construct 7 trapezoidal filter banks. These novel filter banks differ from Mel and Bark scales in shape and frequency regions and are targeted to generalize the feature space. Cross-corpus experimentation is a step forward in speech emotion recognition, but the researchers are usually chagrined at its results. Our goal is to create a feature set that is robust and resistant to cross-corporal variations using various feature selection algorithms. We will prove this by shrinking the dimension of the feature space from 6984 down to 128 while boosting the accuracy using SVM, RBM, and sVGG (small-VGG) classifiers. Although RBMs are considered no longer fashionable, we will show that they can achieve outstanding jobs when tuned properly. This paper discloses a striking 90.65% accuracy rate harnessing STER features on EmoDB.en_US
dc.identifier.doi10.1007/s13369-023-07920-8
dc.identifier.endpage3223en_US
dc.identifier.issn2193-567X
dc.identifier.issn2191-4281
dc.identifier.issue3en_US
dc.identifier.scopus2-s2.0-85160269669en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage3209en_US
dc.identifier.urihttps://doi.org/10.1007/s13369-023-07920-8
dc.identifier.urihttps://hdl.handle.net/20.500.12684/14450
dc.identifier.volume49en_US
dc.identifier.wosWOS:000995842100002en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringer Heidelbergen_US
dc.relation.ispartofArabian Journal For Science and Engineeringen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectSpeech emotion recognitionen_US
dc.subjectFilter banksen_US
dc.subjectFeature selectionen_US
dc.subjectSVMen_US
dc.subjectRestricted Boltzmann machinesen_US
dc.subjectConvolutional neural networksen_US
dc.subjectFundamental-Frequencyen_US
dc.subjectLoudnessen_US
dc.subjectClassificationen_US
dc.subjectExtractionen_US
dc.subjectNetworksen_US
dc.titleSpectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognitionen_US
dc.typeArticleen_US

Dosyalar