Алгоритмы построения неэлементарных линейных регрессий методом включения

Михаил Павлович Базилевский

doi:10.17308/sait/1995-5499/2024/1/104-113

Authors

Mikhail P. Bazilevskiy Irkutsk State Transport University https://orcid.org/0000-0002-3253-5697 (unauthenticated)

DOI:

https://doi.org/10.17308/sait/1995-5499/2024/1/104-113

Keywords:

regression analysis, subset selection, non-elementary linear regression, forward selection, algorithm, unemployment

Abstract

The article is devoted to solving the problem of the subset selection in non-elementary linear regressions, which generally include not only explanatory variables, but also all possible combinations of their pairs, transformed using binary operations min and max. It is known that the optimal solution of such a problem can be achieved by exhaustive enumeration of all possible models. But even for linear regression, it still remains the most time-consuming of all existing subset selection methods, and for non-elementary linear regressions, in which the number of regressors is an order of magnitude greater, its complexity increases significantly. It is known that the forward selection method allows you to quickly get, although often not optimal, but a good solution. Considering that non-elementary linear regressions include not only explanatory variables, but also regressors containing unknown parameters inside, such models require the development of new algorithms for the forward selection method. In this article, the composition of regressors in non-elementary linear regressions is further expanded through the use of binary operations with a intercept. Two algorithms for the forward selection method are proposed. The first of them is implemented without adjusting the coefficients included in the binary operations, and the second one with the adjustment. In this regard, the computational complexity of the second algorithm is higher than that of the first, but at the same time, the second allows obtaining better solutions. Algorithms were tested on the example of modeling the number of unemployed and the unemployment rate in the Irkutsk region. The second algorithm showed the best results. The obtained high-precision models with five regressors and determination coefficients of 0.982 and 0.971 outperformed even overtrained polynomial regressions with fourteen regressors in quality.

Author Biography

Mikhail P. Bazilevskiy, Irkutsk State Transport University

PhD in Technical Sciences, Associate Professor, Department of Mathematics, Irkutsk State Transport University