Обзор исследований по применению методов машинного обучения для повышения эффективности фаззинг-тестирования

Александр Васильевич Козачок; Василий Иванович Козачок; Наталья Сергеевна Осипова; Дмитрий Владимирович Пономарев

doi:10.17308/sait.2021.4/3800

Александр Васильевич Козачок Академия ФСО России https://orcid.org/0000-0002-6501-2008
Василий Иванович Козачок Академия ФСО России https://orcid.org/0000-0001-5384-2269
Наталья Сергеевна Осипова Академия ФСО России https://orcid.org/0000-0002-4878-0865
Дмитрий Владимирович Пономарев ООО НТЦ «Фобос-НТ» https://orcid.org/0000-0003-4912-500X

DOI: https://doi.org/10.17308/sait.2021.4/3800

Ключевые слова: программные дефекты, уязвимости программного обеспечения, фаззинг-тестирование, машинное обучение

Аннотация

Данная статья представляет собой детальный обзор существующих исследований на тему применения методов машинного обучения, с целью повышения эффективности проводимого фаззинг-тестирования. Технологии фаззинг-тестирования появилась еще в 1988 году, но со временем о ней забыли. Две важные тенденции развития современной индустрии производства программного обеспечения позволяют по-новому взглянуть на эту технологию. С одной стороны, при постоянном увеличении объема и сложности ПО любые автоматические средства обнаружения ошибок и контроля качества могут оказаться полезными и востребованными. С другой — непрерывный рост производительности современных вычислительных систем позволяет эффективно решать все более сложные вычислительные задачи. Повышение эффективности фаззинг-тестирования является актуальной проблемой в области информационной безопасности, что подтверждается руководящими документами Федеральной службы по техническому и экспортному контролю России по безопасной разработке программного обеспечения. Интеграция фаззинг-тестирования в процесс разработки программного кода позволяет выявлять ошибки и уязвимости на ранних стадиях разработки. В статье представлена наиболее полная классификация современных фаззеров. Рассмотрены ключевые проблемы, характерные различным типам существующих фаззеров, а также представлены существующие варианты их преодоления и недостатки существующих решений. Также в статье рассмотрены текущие подходы к применению методов машинного обучения на различных этапах фаззинг-тестирования с реальными примерами работ зарубежных ученых. Был произведен сравнительный анализ существующих работ по данной тематике и сделаны выводы, наглядно демонстрирующие повышение эффективности фаззинга при применении методов машинного обучения. Оценка эффективности фаззинга проводилась по двум направлениям: по эффективности применения машинного обучения для фаззинга, а также по улучшению возможности обнаружения уязвимостей. Наглядно представлено улучшение результатов фаззинг-тестирования при применении методов машинного обучения. В статье также предложены актуальные направления для внедрения методов машинного обучения с целью повышения эффективности фаззинг-тестирования.

Скачивания

Данные скачивания пока не доступны.

Биографии авторов

Александр Васильевич Козачок, Академия ФСО России

д-р техн. наук, доцент, сотрудник, Академия ФСО России

Василий Иванович Козачок, Академия ФСО России

д-р соц. наук, сотрудник, Академия ФСО России

Наталья Сергеевна Осипова, Академия ФСО России

сотрудник, Академия ФСО России

Дмитрий Владимирович Пономарев, ООО НТЦ «Фобос-НТ»

технический директор, ООО НТЦ «Фобос-НТ»

Литература

1. Practical aspects of identifying software vulnerabilities. URL
2. Safe programming.Available at: URL
3. Echelon, NGO. GOST R 56939-2016 Information protection. Safe software development. General requirements [text] / NPO Echelon. M. : Standartinform, 2016.
4. Requirements for the security of information, establishing the levels of confidence in the means of technical protection of information and means of ensuring the security of information technologies (approved by order of the FSTEC of Russia No. 76 dated 02.06.2020)
5. Fuzzing web applications in practice. Available at: URL
6. JDeMott D. (2012) Enhancing Automated Fault Discovery and Analysis. A dissertation submitted to Michigan State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Computer Science.
7. Gerasimov A. (2019) Classification of program error warnings by the method of dynamic symbolic execution of the program: Candidate dissertation in physics and mathematics. FSBI ISP RAS, Moscow.
8. Miller Ñ., Peterson Z. N. J. (2007) Analysis of Mutation and Generation-Based Fuzzing. URL
9. AFL++ Overview. Available at: URL
10. LibFuzzer. Available at: URL
11. Yang Q, Li JJ, Weiss DM. A survey of coverage-based testing tools. The Computer Journal 52(5), 207. P. 589–597.
12. Miller C. (2008) Fuzz by number: More data about fuzzing than you ever wanted to know . In Proceedings of the CanSecWest.
13. Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L. (2016 ) Toward Large-Scale Vulnerability Discovery using Machine Learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. ACM. P. 85–96.
14. Wu F, Wang J, Liu J, Wang W. (2017 ) Vulnerability detection with deep learning. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 302 p.
15. Chernis B, Verma R. Machine Learning Methods for Software Vulnerability Detection. In: Proceedings of the Fourth ACM International.
16. Workshop on Security and Privacy Analytics ACM. 2018. p. 31–9; Godefroid P, Peleg H, Singh R. Learn&Fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017. P. 50–9.
17. Rajpal M, Blum W, Singh R. (2017) Not all bytes are equal: Neural byte sieve for fuzzing. P. 1–10.
18. Wang J, Chen B, Wei L, Liu Y. (2017 ) Skyfire: Data-Driven Seed Generation for Fuzzing. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE. P. 579–94. DOI
19. She D, Pei K, Epstein D, Yang J, Ray B, Jana S. (2019 ) NEUZZ: Efficient Fuzzing with Neural Program Smoothing; IEEE Symposium on Security & Privacy. P. 38.
20. Machine learning. Available at: URL
21. Kozachok A. V., Kozachok V. I. (2018) Construction and evaluation of the new heuristic malware detection mechanism based on executable files static analysis // Journal of computer virology and hacking techniques, Springer-Verlag France. DOI
22. Rebert A, Cha SK, Avgerinos T, Foote J, Warren D, Grieco G, et al. (2014 ) Optimizing seed selection for fuzzing. In: 23rd {USENIX} Security Symposium ({USENIX} Security 14). P. 861–75.
23. Nichols N, Raugas M, Jasper R, Hilliard N. Faster Fuzzing: Reinitialization with Deep Neural Models. arXiv preprint arXiv:1711.02807; 2017Zalewski M, 2016.
24. Zalewski M. American fuzzy lop. Available at: URL
25. Lv C, Ji S, Li Y, Zhou J, Chen J, Chen J. (2018) SmartSeed: Smart Seed Generation for Efficient Fuzzing; arXiv preprint arXiv:1807.02606. 26. Cheng L, Zhang Y, Zhang Y, Wu C, Li Z, Fu Y, et al. (2019 ) Optimizing seed inputs in fuzzing with machine learning. In: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings. IEEE Press. P. 244–5.
27. Wang Y, Wu Z, Wei Q, Wang Q. (2019) NeuFuzz: Efficient Fuzzing With Deep Neural Network. IEEE Access. P. 36340–52.
28. Godefroid P, Peleg H, Singh R. (2017) Learn&Fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. P. 50–9.
29. Fan R, Chang Y. (2017) Machine Learning for Black-Box Fuzzing of Network Protocols. In: International Conference on Information and Communications Security. P. 621–32. DOI
30. Hu Z, Shi J, Huang Y, Xiong J, Bu X. (2018) Ganfuzz: A Gan-based industrial network protocol fuzzing framework. In: /Proceedings of the 15th ACM International Conference on Computing Frontiers. ACM. P. 138–45. DOI
31. Cummins C, Petoumenos P, Murray A, Leather H. (2018) Compiler fuzzing through deep learning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM. P. 95–105. DOI
32. Sablotny M, Jensen BS, Johnson CW (2018) Recurrent Neural Networks for Fuzz Testing Web Browsers. In: International Conference on Information Security and Cryptology. P. 354–70. DOI
33. Nasrabadi MZ, Parsa S, Kalaee A. (2018) Format-aware Learn&Fuzz: Deep Test Data Generation for Efficient Fuzzing. arXiv Prepr arXiv181209961.
34. She D, Pei K, Epstein D, Yang J, Ray B, Jana S. (2019) NEUZZ: Efficient Fuzzing with Neural Program Smoothing; IEEE Symposium on Security & Privacy. P. 38
35. Paduraru C, Melemciuc M-C. (2018) An Automatic Test Data Generation Tool using Machine Learning. In: Proceedings of the 13th International Conference on Software Technologies. SCITEPRESS - Science and Technology Publications. P. 506–15. DOI
36. Li Z, Zhao H, Shi J, Huang Y, Xiong J. An Intelligent Fuzzing Data Generation Method Based on Deep Adversarial Learning. IEEE Access, 2019b. P. 40.
37. Liu X, Li X, Prajapati R, Wu D. DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing. In: Proceedings of the... AAAI Conference on Artificial Intelligence, 2019b.
38. Li Y, Ji S, Lv C, Chen Y, Chen J, Gu Q, et al. V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing. arXiv preprint arXiv:1901.01142. 2019a.
39. Gong W, Zhang G, Zhou X. (2017) Learn to Accelerate Identifying New Test Cases in Fuzzing. In: /International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage. Springer, Cham. P. 298–307. DOI
40. Rajpal M, Blum W, Singh R. (2017) Not all bytes are equal: Neural byte sieve for fuzzing. P. 1–10.
41. Karamcheti S, Mann G, Rosenberg D. Improving Grey-Box Fuzzing by Modeling Program Behavior. arXiv Prepr arXiv181108973, 2018b.
42. Becker S, Abdelnur H, State R, Engel T. (2010) An Autonomic Testing Framework for IPv6 Configuration Protocols. In: IFIP International Conference on Autonomous Infrastructure, Management and Security. Springer, Berlin, Heidelberg. P. 65–76.
43. Fang K, Yan G. (2018) Emulation-Instrumented Fuzz Testing of 4G/LTE Android Mobile Devices Guided by Reinforcement Learning. In: European Symposium on Research in Computer Security. P. 20–40. DOI
44. Böttinger K, Godefroid P, Singh R. (2018) Deep reinforcement fuzzing. In: 2018 IEEE Security and Privacy Workshops (SPW). IEEE. P. 116–122.
45. Drozd W, Wagner MD. (2018) FuzzerGym: A Competitive Framework for Fuzzing and Learning. arXiv preprint arXiv:1807.07490. Karamcheti et al. 2018a.
46. Karamcheti S, Mann G, Rosenberg D. Adaptive Grey-Box Fuzz-Testing with Thompson Sampling. In: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. ACM; 2018a. p. 37–47. Liu Xiao, Prajapati, Rupesh, Li Xiaoting WD. Reinforcement Compiler Fuzzing, 2019. DOI
47. Fitness function. Available at: URL
48. Sun X, Fu Y, Dong Y, Liu Z, Zhang Y. (2018) Improving Fitness Function for Language Fuzzing with PCFG Model. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC). IEEE. P. 655–60. DOI
49. Yan G, Lu J, Shu Z, Kucuk Y. (2017) ExploitMeter: Combining Fuzzing with Machine Learning for Automated Evaluation of Software Exploitability. In: 2017 IEEE Symposium on Privacy-Aware Computing (PAC). IEEE. P. 164–75.
50. Tripathi S, Grieco G, Rawat S. (2017) Exniffer: Learning to Prioritize Crashes by Assessing the Exploitability from Memory Dump. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE. P. 239–48.
51. Zhang G, Zhou X, Luo Y, Wu X, Min E. (2018) PTfuzz: Guided Fuzzing With Processor Trace Feedback. IEEE Access. P. 37302–13. DOI
52. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with Deep Reinforcement Learning. arXiv PreprarXiv13125602, 2013.
53. Menczer F, Pant G, Srinivasan P, Ruiz ME. (2001) Evaluating topic-driven web crawlers. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. P. 241–9.
54. Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, et al. (2016) Lava: Large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP). P. 110–21.
55. Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L. (2016) Toward Large-Scale Vulnerability Discovery using Machine Learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. ACM. P. 85–96.