Parkinson Hastalığı Tespitinde Farklı Boyutsallık İndirgeme Yöntemlerinin Karşılaştırılması
Loading...
Files
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
info:eu-repo/semantics/openAccess
Abstract
Parkinson Hastalığı (PH), bireylerin çoklu motor ve motor olmayan özelliklerini doğrudan etkileyen ilerleyici bir sinir hastalığıdır.PH’nin ilk evresinde bireyler genellikle ses bozulmalarıyla karşı karşıya kalır. Bu durumda PH’nin erken tespitinde kişilerin seskayıtlarından yararlanılır. Ses kayıtlarından sinyal işleme yöntemleriyle çıkarılan öznitelikler yapay öğrenme yöntemlerine girdi olarakverilerek bireylerin hastalığa sahip olup olmadığı tespit edilir. Bu çalışmada bireylerin ses kayıtlarından çıkarılan öznitelikler iki farklıyapay öğrenme yöntemine girdi olarak verilmiş ve bireyler Parkinson hastası veya sağlıklı olarak sınıflandırılmıştır. Oluşturulanmodeller UCI Makine Öğrenmesi deposundan alınan veri kümesi ile eğitilmiştir. Hem eğitilen yapay öğrenme modellerininkarmaşıklığını azaltmak hem de modellerin aşırı öğrenmesini engellemek için öznitelikler üzerinde iki farklı boyutsallık indirgemeyöntemi uygulanmıştır. İlk yöntem olan Temel Bileşenler Analizi (TBA)’yle yeni bir öznitelik alt uzayı oluşturmak için öznitelik kümesiorijinal boyuttan daha az boyuta sahip olan yeni bir alt uzaya yansıtılır. Oluşturulan yeni öznitelik uzayında yüksek varyansa sahipbileşenler seçilirken; varyansı düşük bileşenler ihmal edilir. İkinci yöntem olan Özyinelemeli Öznitelik Eleme (ÖÖE)’de özniteliklereyapay öğrenme yöntemleri kullanılarak ilgililik puanları atanır. İlk aşamada tüm öznitelik kümesini kullanan bir model oluşturulur veher öznitelik için bir ilgililik puanı hesaplanır. Sonraki aşamada en az ilgililik puanına sahip öznitelik ihmal edilerek model yenidenoluşturulur ve ilgililik puanları tekrar hesaplanır. Bu işlem öznitelik kümesinde istenilen sayıda öznitelik kalana kadar devam ettirilir.Kullanlan iki Boyutsallık indirgeme yöntemiyle öznitelik uzayının boyutları azaltılmış ve indirgenmiş öznitelik vektörleriyle DestekVektör Makineleri (DVM) ve Gradyan Arttırıcı Makineler (GAM) sınıflandırıcıları eğitilmiştir. Elde edilen veri kümesinin örnek sayısıgörece az olduğundan sınıflandırıcıların eğitiminde Bireyi Dışarda Bırakan Çapraz Doğrulama (BDBÇD) prosedürü kullanılmıştır. Verikümesi aynı zamanda dengesiz sınıf dağılımına sahip olduğundan modellerin performans değerlendirmesinde doğruluk oranıyla birlikteF-ölçütü ve Matthews Korelasyon Katsayısı (MKK) ölçütleri kullanılmıştır. Alınan tüm deneysel sonuçlar irdelendiğinde en yükseksınıflandırma başarısına sadece 13 öznitelik kullanılarak erişildiği görülmüştür. ÖÖE yöntemiyle seçilen 13 öznitelikle GAMsınıflandırıcısı eğitilerek 0,881 doğruluk oranı elde edilmiştir. Doğruluk oranı öznitelik seçimi yapılmadan elde edilen sonuçlara göreyaklaşık %2 oranında artmıştır. Aynı artış sınıfların ayırt edilebilirliğini gösteren MKK oranında da olmuştur. Boyutsallık indirgemeişlemi olmadan elde edilen MKK oranı 0,62 iken ÖÖE yöntemiyle öznitelik seçimi yapıldığında oran 0,67’ye yükselmiştir. Kullanılandiğer boyutsallık indirgeme yöntemi olan TBA ise öznitelik seçimsiz modellere göre sınıflandırma başarısı arttırmamasına rağmen, aynıbaşarı oranlarına daha az sayıda öznitelikle erişmiştir.
Parkinson's Disease (PD) is a progressive neural disease that directly affects multiple motor and non-motor features of the individuals. PD individuals are often confronted with sound distortion in the first stage of the disease. In this case, the voice recordings of the people are used for the early detection. The features extracted from the sound recordings by signal processing methods are given as input to machine learning methods for the detection of the PD. In this study, the features extracted from the voice recordings of individuals were given as input to two different machine learning models for the detection of PD. The models were trained with the dataset obtained from the UCI Machine Learning repository. Two different dimensionality reduction methods were applied on the features in order to reduce the complexity of the trained models and to prevent the over-fitting. The first method, Principal Components Analysis (TBA), projects original feature space into a new subspace that has fewer dimensions than the original. In order to reduce feature dimensions, components with high variances in the new feature space are selected. In the second method, Recursive Feature Elimination (RFE), relevance scores are assigned to the features by using machine learning methods. In the first step, a model that uses the entire set of features is created and a relevance score is calculated for each feature. In the next stage, the model is rebuilt by neglecting the feature with the least relevance score and the relevance scores are recalculated. This process is continued until the desired number of features remains in the feature set. After dimensionality reduction process, Support Vector Machines (SVM) and Gradient Boosting Machines (GBM) classifiers were trained with selected features. Since the number of intances in the dataset was small, One Person Out Cross Validation (OPOCV) was used in classifier training. Due to having data imbalance problem, F-Measure and Matthews Correlation Coefficient (MCC) metrics were used along with accuracy in the performance evaluation. When all the experimental results were examined, it was found out that the highest classification success was achieved by using only 13 features. The GBM classifier was trained with 13 features selected by RFE method to obtain an accuracy of 0.881. Accuracy rate increased by about 2% according to the results obtained without feature selection. The same increase was also seen in the rate of MCC that shows the degree of the class distinguishability. While MCC rate obtained without dimensionality reduction was 0.62, the ratio increased to 0.67 when the feature selection was done with the RFE. PCA, which is the other dimensionality reduction method, did not increase the classification success compared to without selection, but achieved the same success rates with fewer features.
Parkinson's Disease (PD) is a progressive neural disease that directly affects multiple motor and non-motor features of the individuals. PD individuals are often confronted with sound distortion in the first stage of the disease. In this case, the voice recordings of the people are used for the early detection. The features extracted from the sound recordings by signal processing methods are given as input to machine learning methods for the detection of the PD. In this study, the features extracted from the voice recordings of individuals were given as input to two different machine learning models for the detection of PD. The models were trained with the dataset obtained from the UCI Machine Learning repository. Two different dimensionality reduction methods were applied on the features in order to reduce the complexity of the trained models and to prevent the over-fitting. The first method, Principal Components Analysis (TBA), projects original feature space into a new subspace that has fewer dimensions than the original. In order to reduce feature dimensions, components with high variances in the new feature space are selected. In the second method, Recursive Feature Elimination (RFE), relevance scores are assigned to the features by using machine learning methods. In the first step, a model that uses the entire set of features is created and a relevance score is calculated for each feature. In the next stage, the model is rebuilt by neglecting the feature with the least relevance score and the relevance scores are recalculated. This process is continued until the desired number of features remains in the feature set. After dimensionality reduction process, Support Vector Machines (SVM) and Gradient Boosting Machines (GBM) classifiers were trained with selected features. Since the number of intances in the dataset was small, One Person Out Cross Validation (OPOCV) was used in classifier training. Due to having data imbalance problem, F-Measure and Matthews Correlation Coefficient (MCC) metrics were used along with accuracy in the performance evaluation. When all the experimental results were examined, it was found out that the highest classification success was achieved by using only 13 features. The GBM classifier was trained with 13 features selected by RFE method to obtain an accuracy of 0.881. Accuracy rate increased by about 2% according to the results obtained without feature selection. The same increase was also seen in the rate of MCC that shows the degree of the class distinguishability. While MCC rate obtained without dimensionality reduction was 0.62, the ratio increased to 0.67 when the feature selection was done with the RFE. PCA, which is the other dimensionality reduction method, did not increase the classification success compared to without selection, but achieved the same success rates with fewer features.
Description
Keywords
[No Keywords]
Journal or Series
Avrupa Bilim ve Teknoloji Dergisi
WoS Q Value
Scopus Q Value
Volume
0
Issue
17