Towards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformers

dc.authorscopusid57195223464en_US
dc.authorscopusid56539908500en_US
dc.contributor.authorKucukkulahli, E.
dc.contributor.authorKabakus, A.T.
dc.date.accessioned2024-08-23T16:07:39Z
dc.date.available2024-08-23T16:07:39Z
dc.date.issued2024en_US
dc.departmentDüzce Üniversitesien_US
dc.description.abstractAnimal sound perception systems are highly developed compared to humans, crucial for survival in natural environments, with some species possessing specialized sensory capabilities such as vision, hearing, touch, and environmental awareness. Understanding animal sounds not only aids in their own communication and survival but also benefits humans in various fields including security, natural disaster prediction, ecological research, bioacoustics, precision agriculture, and search and rescue operations. Motivated by this fact, this study investigated the classification of cat sounds using deep learning models based on Vision Transformer (ViT) and Convolutional Neural Network (CNN) architectures. Cat vocalizations, represented as mel-spectrograms, were classified using models trained on a diverse dataset of cat sounds. Experimental results demonstrated the superiority of the proposed model based on Microsoft's BERT Pre-Training of Image Transformers (BEiT) over the state-of-the-art as it obtained an exceptional accuracy of 96.95%. Additionally, it was observed that the proposed models based on ViT outperformed CNN-based models, highlighting the efficacy of transformer architectures in capturing complex patterns within audio data. These findings underscore the potential of ViT architectures in decoding animal communication systems and advancing wildlife conservation efforts. © 2024 Elsevier Ltden_US
dc.identifier.doi10.1016/j.apacoust.2024.110218
dc.identifier.issn0003-682X
dc.identifier.scopus2-s2.0-85200721711en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1016/j.apacoust.2024.110218
dc.identifier.urihttps://hdl.handle.net/20.500.12684/14761
dc.identifier.volume226en_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherElsevier Ltden_US
dc.relation.ispartofApplied Acousticsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectConvolutional neural networken_US
dc.subjectDeep learningen_US
dc.subjectSound classificationen_US
dc.subjectTransfer learningen_US
dc.subjectVision transformeren_US
dc.subjectAnimalsen_US
dc.subjectConvolutionen_US
dc.subjectConvolutional neural networksen_US
dc.subjectDeep learningen_US
dc.subjectDisastersen_US
dc.subjectImage reconstructionen_US
dc.subjectLearning systemsen_US
dc.subjectNetwork architectureen_US
dc.subjectClassification modelsen_US
dc.subjectConvolutional neural networken_US
dc.subjectDeep learningen_US
dc.subjectModel-based OPCen_US
dc.subjectNatural environmentsen_US
dc.subjectPerception systemsen_US
dc.subjectSound classificationen_US
dc.subjectSound perceptionen_US
dc.subjectTransfer learningen_US
dc.subjectVision transformeren_US
dc.subjectAuditionen_US
dc.titleTowards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformersen_US
dc.typeArticleen_US

Dosyalar