Three language political leaning text classification using natural language processing methods

Authors

DOI:

https://doi.org/10.15276/aait.05.2022.24

Keywords:

Text classification, political leaning, machine learning algorithms, neural networks, ensembles of models, natural language processing

Abstract

In this article, the problem of political leaning classification of the text resource is solved. First, a detailed analysis of ten studies on the work’s topic was performed in the form of comparative characteristics of the used methodologies. Literary sources were
compared according to the problem-solving methods, the learning that was carried out, the evaluation metrics, and according to the
vectorizations. Thus, it was determined that machine learning algorithms and neural networks, as well as vectorization methods TFIDF and Word2Vec, were most often used to solve the problem. Next, various classification models of whether textual information is
pro-Ukrainian or pro-Russian were built based on a dataset containing messages from social media users about the events of the
large-scale Russian invasion of Ukraine from February 24, 2022. The problem was solved with the help of Support Vector Machines,
Decision Tree, Random Forest, Naïve Bayes classifier, eXtreme Gradient Boosting and Logistic Regression machine learning algorithms, Convolutional Neural Networks, Long short-term memory and BERT neural networks, techniques for working with unbalanced data Random Oversampling, Random Undersampling , SMOTE and SMOTETomek, as well as stacking ensembles of models.
Among the machine learning algorithms, LR performed best, showing a macro F1-score value of 0.7966 when features were transformed by TF-IDF vectorization and 0.7933 when BoW. Among neural networks, the best macro F1-score value of 0.76 was obtained using CNN and LSTM. Applying data balancing techniques failed to improve the results of machine learning algorithms.
Next, ensembles of models from machine learning algorithms were determined. Two of the constructed ensembles achieved the same
macro F1-score value of 0.7966 as with LR. Ensembles that was able to do so consisted of the TF-IDF vectorization, the B-NBC
meta-model, and the SVC, NuSVC LR, and SVC, LR base models, respectively. Thus, three classifiers, the LR machine learning
algorithm and two ensembles of models, which were defined as a combination of existing methods of solving the problem, demonstrated the largest macro F1-score value of 0.7966. The obtained models can be used for a detailed review of various news publications according to the political leaning characteristic, information about which can help people identify being isolated by a filter
bubble.

Downloads

Download data is not yet available.

Author Biographies

Yurii A. Kosiv , Lviv Polytechnic National University, 12, Bandery Str. Lviv, 79013, Ukraine

Student of Artificial Intelligence Department. Lviv Polytechnic National University, 12, Bandery Str.
Lviv, 79013, Ukraine

Vitaliy S. Yakovyna, Lviv Polytechnic National University, 12, Bandery Str. Lviv, 79013, Ukraine

Dr. Sci. (Eng), Professor, Professor of Artificial Intelligence Department. Lviv Polytechnic
National University, 12, Bandery Str. Lviv, 79013, Ukraine

Scopus Author ID: 8393582500

Downloads

Published

2022-12-24

How to Cite

[1]
Kosiv Y.A., Yakovyna V.S.. “Three language political leaning text classification using natural language processing methods”. Applied Aspects of Information Technology. 2022; Vol. 5, No. 4: 359–370. DOI:https://doi.org/10.15276/aait.05.2022.24.