Применение модели дистилляций знаний BERT для анализа настроений текста

Никита Евгеньевич Косых

doi:10.17308/sait/1995-5499/2022/3/139-151

Authors

Nikita Evgenievich Kosykh Petersburg State University of Communications Emperor Alexander I https://orcid.org/0000-0002-3814-7097 (unauthenticated)

DOI:

https://doi.org/10.17308/sait/1995-5499/2022/3/139-151

Keywords:

sentiment analysis, text sentiment classification, distillation, learning model, data preprocessing, data normalization, BERT, ruBert, Python

Abstract

The increasing complexity of neural network architectures and the increasing volume of processed data in machine learning raises the question of the need to apply more productive approaches that would optimize the development of text classification models to solve the tasks of sentiment analysis. The aim of this work is to train and optimize a approaches for data classification as part of the solution of sentiment analysis of the Russian-speaking text. This research proposes the application of pre-trained BERT bidirectional coding models as well as the ruBERT-tiny knowledge distillation model to perform multiclass text classification for sentiment analysis of user text. The application of the data compaction step for knowledge distillation models allows to optimize the training phase of the text classification models. A program is developed in the Python using machine learning libraries. The technical solution allows to test the pre-trained models of data classification, on the basis of which to create optimized models of classification for the analysis of the sentiment of the user texts, taking into account the specifics of the subject area.

Author Biography

Nikita Evgenievich Kosykh, Petersburg State University of Communications Emperor Alexander I

post-graduate student of the Department of In-formation and Computing Systems. Petersburg State University of Communications Emperor Alexander I