Algorithm for classifying pseudo-random sequence salgorithm for the classification of pseudorandom sequences
DOI:
https://doi.org/10.17308/sait.2020.1/2595Keywords:
statistical analysis of data, machine learning, classification of binary sequences, DLP systems, data leak preventionAbstract
The number of information leaks caused by internal violators has increased recently. One of the causes may be the inability of modern DLP systems to prevent information leaks in encrypted or compressed form. The article suggests an algorithm for the classification of sequences generated by encryption and compression algorithms and pseudorandom number generators. To solve the classification problem, we suggest using machine learning methods based on a decision tree algo-rithm. An array of frequencies of binary subsequences of N bit length was used as a feature space. File headers or any other contextual information were not used when constructing the feature space. The choice of hyperparameters of the classifier was substantiated. The suggested algorithm showed the accuracy of classification of the described sequences to be equal to 0.98. The suggested algo-rithm can be implemented in DLP systems to prevent the transmission of information in encrypted or compressed form.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













