Formation of features of machine learning on the basis of topological data analysis

Authors

DOI:

https://doi.org/10.17308/sait/1995-5499/2022/3/115-126

Keywords:

simplicial complex, persistent homology, persistent landscape, machine learning, RKHS, Hilbert space

Abstract

At the present time, interest has increased in the use of algebraic topology methods for topological data analysis and the application of topological data analysis in various fields of knowledge. The goal of topological data analysis is to identify informative topological properties and use them as descriptors in machine learning. The application of machine learning methods for complex systems of large dimensions is difficult due to the methods of adequate representation of functions. The persistent homology method from computational topology provides a balance between reducing the data dimension and characterizing the internal structure of an object. The combination of persistent homology and machine learning is hampered by topological representations of data, distance metrics, and representation of data objects. The paper uses the method of persistent homology, based on the use of filtering to assign a geometric dimension to each topological feature. The filtering process generates a series of simplicial complexes encoded with structural information of various scales. Persistent homology can be represented by a persistent barcode or a persistent diagram. The paper considers mathematical models and functions for representing persistent landscape objects based on the persistent homology method. Betty’s persistent functions and persistent landscape functions are considered. The persistent landscape functions allow you to map persistent diagrams and persistent barcodes into Hilbert space. The representations of topological characteristics in various machine learning models are considered. The structure of the kernel for the analysis of persistent diagrams and the persistent weighted Gaussian kernel are considered. The persistent weighted kernel method allows you to control the persistence in data analysis. Distances between persistent landscapes are defined using the norm of the space LP. Examples of finding the distance between images are given. The appendices present the basic concepts of algebraic topology and the Hilbert space reproducing kernel method for the purposes of machine learning.

Author Biographies

  • Sergey Nikolayevich Chukanov, Sobolev Institute of Mathematics of the Siberian Branch of Russian Academy of Sciences

    leading researcher at Sobolev Institute of Mathematics of the Siberian Branch of Russian Academy of Sciences (Omsk branch), professor, doctor of technical sciences

  • Ilya Stanislavovich Chukanov, Ural Federal University named after the First President of Russia B. N. Yeltsin

    student at Ural Federal University named after the First President of Russia B. N. Yeltsin

References

Downloads

Published

2022-11-09

Issue

Section

Intelligent Information Systems, Data Analysis and Machine Learning

How to Cite

Formation of features of machine learning on the basis of topological data analysis. (2022). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 3, 115-126. https://doi.org/10.17308/sait/1995-5499/2022/3/115-126