Algorithm for classifying pseudo-random sequence salgorithm for the classification of pseudorandom sequences

Authors

DOI:

https://doi.org/10.17308/sait.2020.1/2595

Keywords:

statistical analysis of data, machine learning, classification of binary sequences, DLP systems, data leak prevention

Abstract

The number of information leaks caused by internal violators has increased recently. One of the causes may be the inability of modern DLP systems to prevent information leaks in encrypted or compressed form. The article suggests an algorithm for the classification of sequences generated by encryption and compression algorithms and pseudorandom number generators. To solve the classification problem, we suggest using machine learning methods based on a decision tree algo-rithm. An array of frequencies of binary subsequences of N bit length was used as a feature space. File headers or any other contextual information were not used when constructing the feature space. The choice of hyperparameters of the classifier was substantiated. The suggested algorithm showed the accuracy of classification of the described sequences to be equal to 0.98. The suggested algo-rithm can be implemented in DLP systems to prevent the transmission of information in encrypted or compressed form.

Author Biographies

  • Alexander V. Kozachok, Guard Service Federal Academy

    DSc in Technical Sciences, Russian Federation Security Guard Service Federal Academy

  • Andrey A. Spirin, Guard Service Federal Academy

    Russian Federation Security Guard Service Federal Academy

References

Downloads

Published

2020-03-24

Issue

Section

Information Security

How to Cite

Algorithm for classifying pseudo-random sequence salgorithm for the classification of pseudorandom sequences. (2020). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 1, 87-98. https://doi.org/10.17308/sait.2020.1/2595