Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition
dc.authorid | PARLAK (PhD), Cevahir/0000-0002-5500-7379 | en_US |
dc.authorid | ALTUN, Yusuf/0000-0002-2099-0959 | en_US |
dc.authorid | Diri, Banu/0000-0002-6652-4339 | en_US |
dc.authorscopusid | 55807221400 | en_US |
dc.authorscopusid | 22978771800 | en_US |
dc.authorscopusid | 25031391400 | en_US |
dc.authorwosid | Diri, Banu/AAA-1020-2021 | en_US |
dc.authorwosid | PARLAK (PhD), Cevahir/ABA-4914-2021 | en_US |
dc.authorwosid | ALTUN, Yusuf/AAA-9929-2020 | en_US |
dc.contributor.author | Parlak, Cevahir | |
dc.contributor.author | Diri, Banu | |
dc.contributor.author | Altun, Yusuf | |
dc.date.accessioned | 2024-08-23T16:07:03Z | |
dc.date.available | 2024-08-23T16:07:03Z | |
dc.date.issued | 2024 | en_US |
dc.department | Düzce Üniversitesi | en_US |
dc.description.abstract | In this study, novel Spectro-Temporal Energy Ratio features based on the formants of vowels, linearly spaced low-frequency, and logarithmically spaced high-frequency parts of the human auditory system are introduced to implement single- and cross-corpus speech emotion recognition experiments. Since the underlying dynamics and characteristics of speech recognition and speech emotion recognition differ too much, designing an emotion-recognition-specific filter bank is mandatory. The proposed features will formulate a novel filter bank strategy to construct 7 trapezoidal filter banks. These novel filter banks differ from Mel and Bark scales in shape and frequency regions and are targeted to generalize the feature space. Cross-corpus experimentation is a step forward in speech emotion recognition, but the researchers are usually chagrined at its results. Our goal is to create a feature set that is robust and resistant to cross-corporal variations using various feature selection algorithms. We will prove this by shrinking the dimension of the feature space from 6984 down to 128 while boosting the accuracy using SVM, RBM, and sVGG (small-VGG) classifiers. Although RBMs are considered no longer fashionable, we will show that they can achieve outstanding jobs when tuned properly. This paper discloses a striking 90.65% accuracy rate harnessing STER features on EmoDB. | en_US |
dc.identifier.doi | 10.1007/s13369-023-07920-8 | |
dc.identifier.endpage | 3223 | en_US |
dc.identifier.issn | 2193-567X | |
dc.identifier.issn | 2191-4281 | |
dc.identifier.issue | 3 | en_US |
dc.identifier.scopus | 2-s2.0-85160269669 | en_US |
dc.identifier.scopusquality | Q1 | en_US |
dc.identifier.startpage | 3209 | en_US |
dc.identifier.uri | https://doi.org/10.1007/s13369-023-07920-8 | |
dc.identifier.uri | https://hdl.handle.net/20.500.12684/14450 | |
dc.identifier.volume | 49 | en_US |
dc.identifier.wos | WOS:000995842100002 | en_US |
dc.identifier.wosquality | Q2 | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer Heidelberg | en_US |
dc.relation.ispartof | Arabian Journal For Science and Engineering | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Speech emotion recognition | en_US |
dc.subject | Filter banks | en_US |
dc.subject | Feature selection | en_US |
dc.subject | SVM | en_US |
dc.subject | Restricted Boltzmann machines | en_US |
dc.subject | Convolutional neural networks | en_US |
dc.subject | Fundamental-Frequency | en_US |
dc.subject | Loudness | en_US |
dc.subject | Classification | en_US |
dc.subject | Extraction | en_US |
dc.subject | Networks | en_US |
dc.title | Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition | en_US |
dc.type | Article | en_US |