Identification of metaphors with the help of machine learning
DOI:
https://doi.org/10.17308/lic/1680-5755/2022/4/128-143Keywords:
machine learning, Text Mining, Natural Language Processing, metaphor identification, cryptotype analysis, CNN, supervised learningAbstract
The article discusses the possibilities of creating a classifier for the automatic metaphor identification with the help of machine learning. The model was trained on the basis of a representative dataset of 389 857 examples which was marked up by us manually. The article describes a series of experiments, the difficulties encountered, as well as ways we used to solve them. The following machine learning methods were used: naive Bayes, logistic regression and artificial neural networks. The following parameters were changed: stop words, lemmatization, stemming, the number of N-grams; for neural networks, the parameters were also adjusted: the number of epochs, batch size, the number of examples for training and validation, etc. The best results (Accuracy = 0.88, F1-score = 0.87) were achieved using a convolutional neural network with the following parameters: epochs = 10, layers = 6 (including 2 dropout layers), batch_size = 500, training – 70 % of data, validation – 30 % of data, vectorization = 2 and 3 characters, activation function = relu and sigmoid, optimizer = Adamax, loss_func = binary_crossentropy. As a result we developed automation tools for the classification of corpus examples of metaphorical compatibility, which in the future should contribute to the intensification and popularization of research in this area, due to the reduction of labor and time spent by researchers on processing corpus queries and their classification.











