Исследование и сравнительный анализ методов оптимизации, используемых при обучении нейронных сетей
Abstract
Modern methods of deep learning of neural networks consist in finding the minimum of some continuous error function. In recent years, various optimization algorithms have been proposed that use different approaches to update model parameters. This article is devoted to the analysis of the most common optimization methods used in the tasks of teaching neural networks and forming recommendations on the choice of an algorithm for setting up neural networks on different data sets based on the identified properties. In the process of analysis, various implementations of the gradient descent method, impulse methods, adaptive methods, quasi-Newtonian methods were considered, the problems of their use were generalized, and the main advantages of each method were identified.
Downloads
References
2. Jordan, J. Intro to optimization in deep learning: Gradient Descent/ J. Jordan // Paper-space. Series: Optimization. – 2018. – URL: https://blog.paperspace.com/intro-to-optimiza-tion-in-deep-learning-gradient-descent/
3. Каширина, И. Л. Нейросетевые и гибридные системы: учебно-методическое пособие для вузов / И. Л. Каширина, Т. В. Азарнова. – Воронеж : Издательский дом ВГУ, 2014. – 80 с.
4. Scikit-learn – машинное обучение на Python. – URL: http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
5. Keras documentation: optimizers. – URL: https://keras.io/optimizers
6. Ruder, S. An overview of gradient descent optimization algorithms / S. Ruder // Cornell University Library. – 2016. – URL: https://arxiv.org/abs/1609.04747
7. Robbins, H. A stochastic approximation method / H. Robbins, S. Monro // The annals of mathematical statistics. – 1951. – Vol. 22. – P. 400–407.
8. Нестеров, Ю. Е. Метод минимизации выпуклых функций со скоростью сходимости O(1/k2 ) / Ю.Е. Нестеров // Докл. АН СССР. – 1983. – Т. 269, No 3. – С. 543–547.
9. Поляк, Б. Т. О некоторых способах ускорения сходимости итерационных методов / Б. Т. Поляк // Ж. вычисл. матем. и матем. физ. – 1964. – T. 4, No 5. – C. 1–17.
10. Kukar, M. Cost-Sensitive Learning with Neural Networks / M. Kukar, I. Kononenko // Machine Learning and Data Mining : proceedings of the 13th European Conference on Artificial Intelligence. – 1998. – P. 445–449.
11. Демченко, М. В. Построение нейросетевого классификатора для выявления риска атеросклероза магистральных артерий / М. В. Демченко // Оптимизация и моделирование в автоматизированных системах : материалы всероссийской молодежной науч. школы, Воронеж, 13 декабря 2017 г. – Воронеж. : Изд-во Воронежский государств. технический университет, 2017. – С. 29–36.
12. Duchi, J. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization / J. Duchi, E. Hazan, Y. Singer // The Journal of Machine Learning Research. – 2011. – Vol. 12. – P. 2121–2159.
13. Zeiler, M. D. ADADELTA: An Adaptive Learning Rate Method / Cornell University Library. – 2012. – URL: https://arxiv.org/abs/1212.5701
14. Николенко, C. Глубокое обучение / С. Николенко, А. Кадурин, Е. Архангельская. – СПб. : Питер, 2018. – 480 с.
15. Kingma, D. P. Adam: A Method for Stochastic Optimization / D. P. Kingma, J. Ba // Cornell University Library. – 2014. – URL: https://arxiv.org/abs/1412.6980
16. Гудфеллоу, Я. Глубокое обучение / Я. Гудфеллоу, И. Бенджио, А. Курвилль. – М. : ДМК Пресс, 2018. – 652 с.
17. Поляк, Б. Т. Введение в оптимизацию / Б. Т. Поляк. – М. : Наука. Главная редакция физико-математической литературы, 1983. – 384 с.
18. Fletcher, R. Practical methods of optimization / R. Fletcher. – Wiley, 2000. – 450 p.
19. Schraudolph, N. N. A Stochastic Quasi-Newton Method for Online Convex Optimization / N.N. Schraudolph, J. Yu, S. Gunter // Statistical Machine Learning. – 2017. – URL: http:/ proceedings.mlr.press/v2/schraudolph07a/schraudolph07a.pdf
20. Ruder, S. Optimization for Deep Learning Highlights in 2017 / S. Ruder // Optimization for Deep Learning Highlights in 2017. – 2017. – URL: http://ruder.io/deep-learning-optimiza-tion-2017
21. Kawaguchi, K. Deep Learning without Poor Local Minima / K. Kawaguchi // Advances in Neural Information Processing Systems. – 2016. – URL: http://arxiv.org/abs/1605.07110
22. Zhang, C. Understanding deep learning requires rethinking generalization / C. Zhang, S. Bengio, S. Bengio, M. Hardt, B. Recht, O. Vinyals // Cornell University Library. – 2016. – URL: https://arxiv.org/abs/1611.03530
23. Wilson, A. C. The Marginal Value of Adaptive Gradient Methods in Machine Learning / A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, B. Recht // Cornell University Library. – 2017. – URL: https://arxiv.org/abs/1705.08292
24. Тhe MNIST database. – URL: http://yann.lecun.com/exdb/mnist/
Условия передачи авторских прав in English