Towards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformers

Kucukkulahli, E.; Kabakus, A.T.

Towards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformers

Tarih

2024

Yazarlar

Kucukkulahli, E.

Kabakus, A.T.

Yayıncı

Elsevier Ltd

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Animal sound perception systems are highly developed compared to humans, crucial for survival in natural environments, with some species possessing specialized sensory capabilities such as vision, hearing, touch, and environmental awareness. Understanding animal sounds not only aids in their own communication and survival but also benefits humans in various fields including security, natural disaster prediction, ecological research, bioacoustics, precision agriculture, and search and rescue operations. Motivated by this fact, this study investigated the classification of cat sounds using deep learning models based on Vision Transformer (ViT) and Convolutional Neural Network (CNN) architectures. Cat vocalizations, represented as mel-spectrograms, were classified using models trained on a diverse dataset of cat sounds. Experimental results demonstrated the superiority of the proposed model based on Microsoft's BERT Pre-Training of Image Transformers (BEiT) over the state-of-the-art as it obtained an exceptional accuracy of 96.95%. Additionally, it was observed that the proposed models based on ViT outperformed CNN-based models, highlighting the efficacy of transformer architectures in capturing complex patterns within audio data. These findings underscore the potential of ViT architectures in decoding animal communication systems and advancing wildlife conservation efforts. © 2024 Elsevier Ltd

Anahtar Kelimeler

Convolutional neural network, Deep learning, Sound classification, Transfer learning, Vision transformer, Animals, Convolution, Convolutional neural networks, Deep learning, Disasters, Image reconstruction, Learning systems, Network architecture, Classification models, Convolutional neural network, Deep learning, Model-based OPC, Natural environments, Perception systems, Sound classification, Sound perception, Transfer learning, Vision transformer, Audition

Kaynak

Applied Acoustics

Scopus Q Değeri

Q1

Cilt

226

Bağlantı

https://doi.org/10.1016/j.apacoust.2024.110218
https://hdl.handle.net/20.500.12684/14761

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Towards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformers

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon