Development of a unified feature space for ensemble classification of polystructural heterogeneous small data
Main Article Content
Abstract
Relevance. The paper addresses the problem of classification of polystructural heterogeneous small data, which include structured tabular information and unstructured audio signals. Objective. The study proposes a unified methodological framework for constructing an ensemble classification system based on a common feature space that ensures compatibility between different data types and machine learning models. Main Research Material. A unified feature space model is developed, incorporating a linear representation for structured data and a three-level representation for audio data, including tabular spectral features, spectrogram-based image representation, and temporal sequences of features. To improve the quality of input data, a Backward Feature Elimination procedure is applied to adapt feature subsets to the specifics of individual classifiers and the removal of non-informative features. The classification framework is based on a stacking ensemble architecture that combines multiple base models, including classical machine learning algorithms and deep learning models. Three aggregation strategies are considered: hard voting, soft voting, and soft voting with Gompertz fuzzy ranking, which enables nonlinear adjustment of classifier probabilities and improves robustness under uncertainty. Results. Experimental evaluation was conducted on five datasets from different domains, including healthcare, finance, audio signal analysis, and deepfake detection. The results demonstrate that the proposed approach consistently improves classification performance compared to individual models. The application of feature selection and the integration of ensemble methods provides significant gains for polystructural data. Conclusions. The proposed model offers a flexible and scalable solution for handling heterogeneous small data and can be effectively applied across multiple domains, providing improved generalization, robustness to noise, and adaptability to different data representations.

