Метод смешанного оценивания параметров линейной регрессии: особенности применения

Сергей Иванович Носков

doi:10.17308/sait.2021.1/3377

Authors

Sergey I. Noskov Irkutsk State Transport University https://orcid.org/0000-0003-4097-2720 (unauthenticated)

DOI:

https://doi.org/10.17308/sait.2021.1/3377

Keywords:

linear regression equation, data sampling, mixed estimation method, Мanhattan distance, Chebyshev distance, outliers

Abstract

The presented study is based on the method of the mixed estimation of unknown parameters of linear regression equations proposed earlier by the author. This method assumes the simultaneous minimisation of different loss functions in different parts of the processed data sample. The main advantage of this approach is that it combines the strengths of each parameter estimation method used when processing a single data sample. The article discusses the ways to form subsamples of the initial sample for the loss functions corresponding to the Manhattan and Chebyshev distances. These functions react differently to observations that are inconsistent with the sample – the former essentially ignores them, while the latter, on the contrary, is extremely sensitive to them. The article demonstrates that the implementation of the mixed estimation method for such a combined loss function is reduced to a linear programming problem. When dividing the initial sample into subsamples, we used the following advantages of the methods for estimating the parameters of linear regression equations: the least absolute deviation method ensures that the number of zero approximation errors equals the number of parameters; the anti-robust estimation method ensures that the number of maximum approximation errors in the module is no fewer than the number of parameters plus one. In the article, we consider a numerical example with ten observations and three independent variables. We compared the estimates of the parameters and values of certain adequacy criteria obtained when using the methods of least squares and modules, the anti-robust estimation method, and the mixed estimation method. In this case, the initial sample is divided into two subsamples. For one subsample, the method of mixed estimation tends to ignore outlying observations, and for the other, on the contrary, implicitly gives them more weight. It thus combines the advantages of the methods of least modules and anti-robust estimation when applied to the same data, generally enhancing the adequacy of the data processing.

Author Biography

Sergey I. Noskov, Irkutsk State Transport University

DSc in Technical Sciences, Professor, Department of Information Systems and Information Security, Irkutsk State Transport University