Post-clustering interpretation of gene expression data using functional enrichment and network analysis
Main Article Content
Abstract
Clustering of gene expression profiles is a core technique used to reveal hidden biological structures and differentiate disease subtypes in high-dimensional biomedical datasets. Nevertheless, translating cluster structures into biologically meaningful insights requires integrative analytical strategies that go beyond unsupervised learning. In this work, we introduce a novel integrative computational approach that emphasizes post-clustering interpretation by combining statistical functional enrichment with network-based modeling. Clusters of gene expression profiles, previously identified in patients with distinct cancer types, were subjected to enrichment analysis using Gene Ontology, the Kyoto Encyclopedia of Genes and Genomes, and Reactome databases. The enrichment was performed with the g:Profiler tool, allowing the detection of significantly overrepresented biological processes, molecular functions, cellular components, and signaling pathways within each cluster. To visualize and further interpret the enriched functional categories, Cytoscape software was employed. Functional interaction networks were constructed using two key modules: ClueGO, which integrates Gene Ontology and pathway annotation into a functionally grouped network, and CluePedia, which expands these networks by showing relationships between genes and enriched terms. This network-based visualization enabled deeper biological interpretation and facilitated the identification of core functional themes. The analysis revealed that each gene cluster is associated with distinct biological processes, such as immune signaling, metabolic pathways, DNA repair, or cell cycle regulation. The novelty of the proposed approach lies in its systematic integration of enrichment statistics with graph-based visualization, ensuring both computational rigor and biological interpretability. These findings confirm that the method can extract biologically consistent knowledge from complex gene expression data. In summary, the study presents an innovative post-clustering interpretation strategy that bridges unsupervised machine learning and functional genomics. This approach advances the explainability of computational analysis and supports its application in disease subtyping, biomarker discovery, and personalized medicine research.
							
