Hybrid graphs for code smells: a multi-level model for anti-pattern detection in software components

Дмитро Дмитрович Курінько

doi:10.15276/aait.08.2025.18

PDF

Published:
2025-09-26

DOI: https://doi.org/10.15276/aait.08.2025.18

Keywords:

Machine learning, software engineering, program analysis, graph representation learning, static analysis, uncertainty estimation, transfer learning, empirical evaluation

PDF

How to cite

How to Cite

(1)

Курінько Д. Д. " Hybrid Graphs for Code Smells: A Multi-Level Model for anti-Pattern Detection in Software Components" Publ. Nauka i Tekhnika. Odesa: Ukraine. ААІТ 8 (3), 274–285. https://doi.org/10.15276/aait.08.2025.18.

Dmytro D. Kurinko

Odesa Polytechnic National University, 1, Shevchenko Ave. Odesa, 65044, Ukraine

https://orcid.org/0000-0001-8304-3257

Abstract

The paper proposes a hybrid, multi-level method for detecting code smells and anti-patterns in software components, where structure, semantics, metrics, and evolution are treated as first-class signals. A heterogeneous Code Property Graph (Abstract Syntax Tree + Control-flow Graph + Program Dependence Graph) is constructed and enriched with textual embeddings from a pretrained code language model, classical quality metrics (Chidamber–Kemerer, Halstead), and version-control history (churn, co-change, recency). Local idioms are summarized via a sequence–graph encoder at the method/block level, component structure is aggregated by a relation-aware Graph Neural Network at the class/module level, and project context is propagated over a component-interaction graph. To support deployment in evolving codebases, an open-set head is introduced: energy, entropy, and stochastic variance are combined to enable calibrated abstention on unfamiliar patterns. The approach is evaluated on polyglot Java Virtual Machine corpora using time-aware, cross-project splits with multi-label targets (Long Method, God Class, Feature Envy, Data Class, Shotgun-Surgery–like, No-smell). Improvements in macro Area Under the Precision–Recall Curve and F1 overrule/metric baselines, Abstract Syntax Tree-only, and text-only models are observed, while FPR@95TPR is maintained or reduced. Withheld-class experiments show that open-set gating increases Area Under ROC for Open-Set Recognition and TNR@TPR and lowers calibration error, yielding probabilities suitable for thresholded automation and human triage. Cross-language transfer (train Java → test Kotlin/Scala) is shown to be stronger than with single-view models, aided by language-agnostic typing and per-project normalization. Incremental graph maintenance confines computation to changed regions, aligning inference time with CI/CD budgets. By exposing hierarchical attention and channel gates, explanations are produced that align with practitioner reasoning. It is concluded that hybrid graphs with hierarchical reasoning and selective prediction deliver detectors that are more accurate, transferable, and operationally safer for evolving software systems.

Downloads

Download data is not yet available.

Issue

Vol. 8 No. 3 (2025): Applied Aspects of Information Technology

Topics

Section

Computer science and software engineering

Authors

Author Biography

Dmytro D. Kurinko, Odesa Polytechnic National University, 1, Shevchenko Ave. Odesa, 65044, Ukraine

PhD Student, Artificial Intelligence and Data Analysis Department

Hybrid graphs for code smells: a multi-level model for anti-pattern detection in software components

How to cite

How to Cite

Abstract

Downloads

Issue

Topics

Section

Authors

Author Biography

Dmytro D. Kurinko, Odesa Polytechnic National University, 1, Shevchenko Ave. Odesa, 65044, Ukraine

Similar Articles

Menu

Article Sidebar

How to cite

How to Cite

Main Article Content

Abstract

Downloads

Article Details

Issue

Topics

Section

Authors

Author Biography

Dmytro D. Kurinko, Odesa Polytechnic National University, 1, Shevchenko Ave. Odesa, 65044, Ukraine

Similar Articles

Menu