Decision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selection

dc.authorscopusid57203489662en_US
dc.contributor.authorGuven, E.
dc.date.accessioned2024-08-23T16:07:29Z
dc.date.available2024-08-23T16:07:29Z
dc.date.issued2023en_US
dc.departmentDüzce Üniversitesien_US
dc.description.abstractBackground: There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. The recent utilization of nonnegative matrix factorization (NMF) in computational biology has demonstrated the capability to derive essential details from a high amount of data in particular gene expression microarrays. A common problem in NMF is finding the proper number rank (r) of factors of the degraded demonstration, but no agreement exists on which technique is most appropriate to utilize for this purpose. Thus, various techniques have been suggested to select the optimal value of rank factorization (r). Objective: In this work, a new metric for rank selection is proposed based on the elbow method, which was methodically compared against the cophenetic metric. Methods: To decide the optimum number rank (r), this study focused on the unit invariant knee (UIK) method of the NMF on gene expression data sets. Since the UIK method requires an extremum distance estimator that is eventually employed for inflection and identification of a knee point, the proposed method finds the first inflection point of the curvature of the residual sum of squares of the proposed algorithms using the UIK method on gene expression data sets as a target matrix. Results: Computation was conducted for the UIK task using gene expression data of acute lymphoblastic leukemia and acute myeloid leukemia samples. Consequently, the distinct results of NMF were subjected to comparison on different algorithms. The proposed UIK method is easy to perform, fast, free of a priori rank value input, and does not require initial parameters that significantly influence the model’s functionality. Conclusions: This study demonstrates that the elbow method provides a credible prediction for both gene expression data and for precisely estimating simulated mutational processes data with known dimensions. The proposed UIK method is faster than conventional methods, including metrics utilizing the consensus matrix as a criterion for rank selection, while achieving significantly better computational efficiency without visual inspection on the curvatives. Finally, the suggested rank tuning method based on the elbow method for gene expression data is arguably theoretically superior to the cophenetic measure. © Emine Guven. Originally published in JMIR Bioinformatics and Biotechnology.en_US
dc.identifier.doi10.2196/43665
dc.identifier.issn2563-3570
dc.identifier.issue1en_US
dc.identifier.scopus2-s2.0-85169437004en_US
dc.identifier.scopusqualityN/Aen_US
dc.identifier.urihttps://doi.org/10.2196/43665
dc.identifier.urihttps://hdl.handle.net/20.500.12684/14679
dc.identifier.volume4en_US
dc.indekslendigikaynakScopusen_US
dc.institutionauthorGuven, E.en_US
dc.language.isoenen_US
dc.publisherJMIR Publications Inc.en_US
dc.relation.ispartofJMIR Bioinformatics and Biotechnologyen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectconsensus matrixen_US
dc.subjectelbow methoden_US
dc.subjectgene expression dataen_US
dc.subjectnonnegative matrix factorizationen_US
dc.subjectoptimal ranken_US
dc.subjectrank factorizationen_US
dc.subjectunit invariant knee methoden_US
dc.titleDecision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selectionen_US
dc.typeArticleen_US

Dosyalar