Выявление метафорической сочетаемости методами машинного обучения

О. В. Донина

doi:10.17308/lic/1680-5755/2022/4/128-143

О. В. Донина Воронежский государственный университет

DOI: https://doi.org/10.17308/lic/1680-5755/2022/4/128-143

Ключевые слова: машинное обучение, Text Mining, Natural Language Processing, автоматическое выявление метафор, криптоклассный анализ, нейронные сети, обучение с учителем

Аннотация

В рамках данной статьи рассмотрены возможности создания классификатора по автоматическому определению метафор методами машинного обучения. Нами был собран представительный датасет из 389 857 примеров, размеченных вручную, на основе которого и происходило обучение модели. В статье описана серия экспериментов, возникшие сложности, а также способы их решения. Так, для решения поставленной задачи были применены: наивный байесовский классификатор, логистическая регрессия и искусственные нейронные сети. Эксперименты происходили при изменении следующих параметров: наличие стоп-слов, лемматизация, стемминг, количество N-gram; для нейронных сетей также корректировались параметры: количество эпох, размер партии, количество примеров для обучения и валидации и пр. Лучшие результаты (Accuracy = 0,88, F1-score = 0,87) были достигнуты при помощи сверхточной нейронной сети со следующими параметрами: эпохи = 10, слои = 6 (в том числе 2 слоя dropout), batch_size = 500, обучение – на 70 % данных, валидация – на 30 % данных, векторизация = 2 и 3 символа, функция активации = relu и sigmoid, оптимизатор = Adamax, loss_func = binary_crossentropy. В результате проделанной работы удалось разработать средства автоматизации классификации корпусных примеров метафорической сочетаемости, что в перспективе должно содействовать интенсификации и популяризации исследований в области изучения метафор в связи с уменьшением трудо- и времязатрат исследователей по обработке корпусных примеров.

Скачивания

Данные скачивания пока не доступны.

Биография автора

О. В. Донина, Воронежский государственный университет

кандидат филологических наук, доцент кафедры теоретической и прикладной лингвистики

Литература

1. Boriskina O. O., Marchenko T. An algorithm for analysis of distribution of abstract nouns in cryptotypes. In: Proceedings of the 2010 International Conference on Artificial Intelligence, ICAI 2010. 2010. Pр. 907–913.
2. Donina O. V. Realizaciya koncepcii korpusnogo issledovaniya leksiki v xode uchebnoj praktiki bakalavrov lingvistiki [Implementation of the concept of corpus research of vocabulary during the educational practice of bachelors of Linguistics]. In: Territoriya nauki. 2017. No. 4. Pp. 173–177.
3. Boriskina O. O., Donina O. V. Korpusnye issledovaniya v kontekste sovremennyh tekhnologij obucheniya yazyku [Corpus research in the context of modern language teaching technologies]. In: Lingvoritoricheskaya paradigma: teoreticheskie i prikladnye aspekty. 2017. No. 22-2. Pр. 154–158.
4. Donina O. V. How To Use Machine Learning To Automatically Detect Dead Metaphors. In: RaAM14. Conference Book of Abstracts. 2021. Pp. 247–248.
5. Dmitriev D. S., Donina O. V. Vozmozhnost` ispol`zovaniya metodov mashinnogo obucheniya dlya avtomaticheskogo vy`yavleniya sterty`x metafor [using machine learning methods to automatically identify erased metaphors]. In: Lingvisticheskij forum 2020: Yazy`k i
iskusstvenny`j intellekt. Institut yazy`koznaniya RAN, 2020. Pp. 83–84.
6. Donina O. V. Avtomatizaciya lingvisticheskix issledovanij [Automation of linguistic research]. Voronezh : Izdatel`skij dom VGU, 2022. 125 p.
7. Sag A. Programma dlya diagnostirovaniya pola i vozrasta avtora teksta s uchetom vozmozhnogo iskazheniya priznakov pis`mennoj rechi s ocenkoj ix e`ffektivnosti [A program for diagnosing the gender and age of the author of the text, taking into account the possible distortion of the signs of written speech with an assessment of their effectiveness]. 2018. Available at: URL
8. Navlani A. Text Analytics for Beginners using NLTK. 2018. Available at: URL
9. Glek P. Tutorial: sozdanie nejroseti dlya analiza nastroenij v kommentariyax c Keras [Tutorial: creating a neural network for sentiment analysis in comments with Keras]. 2018. Available at: URL
10. MacArthur F., Oncins-Martinez J. L., Sánchez-García M., Piquer-Píriz A. M. Metaphor in Use: Context, Culture, and Communication. John Benjamins Publishing, 2012. 379 p.
11. Neuman Y., Assaf D., Cohen Y., Last M., Argamon S., Howard N., Frieder O. Metaphor Identification in Large Texts Corpora. In: PLoS One. 2013. No. 8 (4). Pp. 36–39.
12. Gandy L., Allan N., Atallah M., Frieder O., Howard N., Kanareykin S., Koppel M., Last M., Neuman Y., Argamon S. Automatic Identification of Conceptual Metaphors with Limited Knowledge. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. 2013. Pp. 328–334.
13. Shlomo Y. B., Last M. MIL: Automatic Metaphor Identification by Statistical Learning. In: Proceedings of DMNLP, Workshop at ECML/PKDD. 2014. Pp. 18–29.
14. Pechlivanis K., Konstantopoulos S. Corpus Based Methods for Learning Models of Metaphor in Modern Greek. In: Statistical Language and Speech Processing: Third International Conference, SLSP. 2015. Рp. 219–229.
15. Dodge E., Hong J., Stickles E. MetaNet: Deep semantic automatic metaphor analysis. In: Proceedings of the Third Workshop on Metaphor in NLP. 2015. Pp. 40–49.
16. Veale T., Shutova E., Klebanov B. B. Metaphor: A Computational Perspective. Morgan & Claypool Publishers, 2016. 160 p.
17. Shutova E., Teufel S. Metaphor corpus annotated for source-target domain mappings. In: Proceedings of LREC 2010: The 7th international conference on Language Resources and Evaluation. 2010. Рp. 3255–3261.
18. Birke J., Sarkar A. A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language. In: 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006. Pp. 329–336.
19. Hovy D., Srivastava S., Jauhar S. K., Sachan M., Goyal K., Li H., Sanders W., Hovy E. Identifying metaphorical word use with tree kernels. In: Proceedings of the First Workshop on Metaphor in NLP. 2013. Pp. 52–59.
20. Sporleder C., Li L. Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions. In: Conference: EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference. 2009. Pp. 754–762.
21. Li L., Sporleder C. Linguistic Cues for Distinguishing Literal and Non-Literal Usages. In: Conference: COLING 2010, 23rd International Conference on Computational Linguistics, Posters Volume. 2010. Pp. 683–691.
22. Klebanov B. B., Leong B., Heilman M., Flor M. Different texts, same metaphors: Unigrams and beyond. In: Proceedings of the Second Workshop on Metaphor in NLP. 2014. Pp. 11–17.
23. Shutova E., Sun L. Unsupervised metaphor identification using hierarchical graph factorization clustering. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013. Pp. 978–988.
24. Panicheva P. V., Badryzlova Yu. G. Distributional Semantic Features in Russian Verbal Metaphor Identification. In: Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference “Dialogue” (2017). Moscow: RSUH, 2017. Vol. 16. Pp. 179–190.
25. Kaushik S., Gupta D., Kharb L., Chahal D. Information, Communication and Computing Technology. Springer, 2017. 388 p.
26. Bizzoni Y., Chatzikyriakidis S., Ghanimifard M. “Deep” Learning: Detecting Metaphoricity in Adjective-Noun Pairs. In: Proceedings of the Workshop on Stylistic Variation. 2017. Pp. 43–52.
27. Rai S., Chakraverty S., Garg A. Effect of Classifiers on Type-III Metaphor Detection. In: Towards Extensible and Adaptable Methods in Computing. Springer, 2018. Pp. 241–250.
28. Rai S., Chakraverty S., Tayal D. K., Kukreti Y. A Study on Impact of Context on Metaphor Detection. In: The Computer Journal. 2018. Vol. 61, Iss. 11. Pp. 1667–1682.
29. Sidorov K. A., Donina O. V., Korotkix A. D., Pendyurina A. A. Vozmozhnosti ispol`zovaniya iskusstvenny`x nejronny`x setej dlya klassifikacii tekstov po variantu yazy`- ka i zhanru [using artificial neural networks to classify texts by language variant and genre]. In: Matematika i mezhdisciplinarny`e issledovaniya – 2020. Materialy` Vserossijskoj nauchno-prakticheskoj konferencii molody`x ucheny`x s mezhdunarodny`m uchastiem. Perm`, 2020. Pp. 189–193.
30. Sidorov K. A., Donina O. V., Korotkix A. D. Avtomatizaciya binarnoj klassifikacii tekstov anglijskogo yazy`ka po variantu yazy`ka i zhanru s primeneniem texnologii iskusstvenny`x nejronny`x setej [Automation of binary classification of English language texts by language variant and genre using artificial neural network technology]. In: Informatika: Problemy`, Metody`, Texnologii. Materialy` XXI Mezhdunarodnoj nauchno-metodicheskoj konferencii. Voronezh, 2021. Pp. 1508–1514.