Viscous gravitational algorithm for clustering inacurate data

Authors

DOI:

https://doi.org/10.17308/sait.2022.1/9203

Keywords:

data clustering, imprecise data, gravity algorithm, viscosity, Pauli repulsion

Abstract

Clustering is one of the basic problems of machine learning, along with pattern recognition, classification and forecasting. The role of clustering is especially important in the analysis of Big Data, work with which can only be carried out using computer technologies. At the same time, the problem of automatic partitioning into clusters, taking into account the errors of the initial data, has not up to now an unambiguous solution and requires a search for more adequate approaches, including automatic determination of the number of clusters. The paper proposes a new method for data clustering, based on a modification of the gravitational algorithm, which uses an analogy with the formation of stellar clusters due to the attraction of masses in accordance with the law of universal gravitation. When applying this approach to data clustering, real physical masses are replaced by points in a multidimensional data space, and the motion of these points, taking into account their attraction, leads to the formation of clusters. The disadvantage of this method is the manifestation of the effects of inertia, which can hinder the clustering process and lead to the ejection of accelerated particles from the cluster at the stage of its formation. To exclude such phenomena, we use a model of the dynamics of viscous motion of particles representing the data and the natural limitation of the cluster size due to the repulsion of particles. When simulating the repulsive force of particles, the interaction in the Pauli form was taken for fermions with the same spins and the Gaussian distribution of the error density. The basic equations describing the steps of the presented modification of the gravitational algorithm are written. A numerical example demonstrates the features and advantages of the viscous gravity algorithm in comparison with the k-means method and the density-based DBSCAN method, including automatic termination of the procedure when the main clustering process is completed. The results obtained allow for blind clustering of Big Data, and can be generalized to solving multidimensional optimization problems.

Author Biography

  • Pavel A. Golovinski, Voronezh State Technical University

    Dr. Phys.-Math. Sci., Professor of the Department of Innovation and Building Physics named after I. S. Surovtsev, Voronezh State Technical University

References

Downloads

Published

2022-04-26

Issue

Section

Intelligent Information Systems, Data Analysis and Machine Learning

How to Cite

Viscous gravitational algorithm for clustering inacurate data. (2022). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 1, 79-89. https://doi.org/10.17308/sait.2022.1/9203

Most read articles by the same author(s)