A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing with VGG16
dc.authorid | ALTUN, Yusuf/0000-0002-2099-0959 | en_US |
dc.authorscopusid | 55807221400 | en_US |
dc.authorscopusid | 25031391400 | en_US |
dc.authorwosid | ALTUN, Yusuf/AAA-9929-2020 | en_US |
dc.contributor.author | Parlak, Cevahir | |
dc.contributor.author | Altun, Yusuf | |
dc.date.accessioned | 2024-08-23T16:07:17Z | |
dc.date.available | 2024-08-23T16:07:17Z | |
dc.date.issued | 2024 | en_US |
dc.department | Düzce Üniversitesi | en_US |
dc.description.abstract | In this text, we discuss the filter banks used for speech analysis and propose a novel filter bank for speech processing applications. Filter banks are building blocks of speech processing applications. Multiple filter strategies have been proposed, including Mel, PLP, Seneff, Lyon, and Gammatone filters. MFCC is a transformed version of Mel filters and is still a state-of-the-art method for speech recognition applications. However, 40 years after their debut, time is running out to launch new structures as novel speech features. The proposed acoustic filter banks (AFB) are innovative alternatives to dethrone Mel filters, PLP filters, and MFCC features. Foundations of AFB filters are based on the formant regions of vowels and consonants. In this study, we pioneer an acoustic filter bank comprising 11 frequency regions and conduct experiments using the VGG16 model on the TIMIT and Speech Command V2 datasets. The outcomes of the study concretely indicate that MFCC, Mel, and PLP filters can effectively be replaced with novel AFB filter bank features. | en_US |
dc.description.sponsorship | DAS:Data are available at the following site: https://github.com/cevparlak/AFB-Filters. | en_US |
dc.identifier.doi | 10.1007/s00034-024-02794-z | |
dc.identifier.issn | 0278-081X | |
dc.identifier.issn | 1531-5878 | |
dc.identifier.scopus | 2-s2.0-85200054858 | en_US |
dc.identifier.scopusquality | Q2 | en_US |
dc.identifier.uri | https://doi.org/10.1007/s00034-024-02794-z | |
dc.identifier.uri | https://hdl.handle.net/20.500.12684/14562 | |
dc.identifier.wos | WOS:001281629000005 | en_US |
dc.identifier.wosquality | N/A | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer Birkhauser | en_US |
dc.relation.ispartof | Circuits Systems and Signal Processing | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Speech processing | en_US |
dc.subject | MFCC | en_US |
dc.subject | Mel filters | en_US |
dc.subject | PLP | en_US |
dc.subject | Filter banks | en_US |
dc.subject | Convolutional neural networks | en_US |
dc.subject | Discrimination | en_US |
dc.subject | Recognition | en_US |
dc.subject | Frequency | en_US |
dc.subject | Loudness | en_US |
dc.subject | Perception | en_US |
dc.subject | Model | en_US |
dc.title | A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing with VGG16 | en_US |
dc.type | Article | en_US |