Автоматическое построение двуязычного словаря на основе вывода GIZA++

Альбина Маратовна Хусаинова; Виталий Анатольевич Романов; Адил Мехмуд Хан

doi:10.17308/sait/1995-5499/2022/4/189-201

Authors

Albina M. Khusainova Innopolis University https://orcid.org/0000-0002-0636-3449 (unauthenticated)
Vitaly A. Romanov Innopolis University https://orcid.org/0000-0003-3772-0039 (unauthenticated)
Adil M. Khan Innopolis University https://orcid.org/0000-0003-2220-8518 (unauthenticated)

DOI:

https://doi.org/10.17308/sait/1995-5499/2022/4/189-201

Keywords:

phrase translation, collocation translation, construction, bilingual dictionary, phrase dictionary, machine translation, automatic dictionary language resources

Abstract

Modern encoder-decoder based neural machine translation (NMT) models are normally trained on parallel sentences. Hence, they give best results when translating full sentences rather than sentence parts. Thereby, the task of translating commonly used phrases, which often arises for language learners, is not addressed by NMT models. While for high-resourced language pairs human-built phrase dictionaries exist, less-resourced pairs do not have them. In this paper, we propose an automatic approach to create such a dictionary based on the output of the statistical tool GIZA++ followed by filtering with heuristics. We analyze the translation quality obtained with this approach and compare it with reference translations and with phrases translation using a sentences-trained NMT system. The results show that, despite the problems identified, the phrase translations are most often correct, and even if they do not match the reference translation, they represent valid alternative translations. Another important result is that this approach works significantly better than the phrase translation using the NMT system. Using the proposed approach, we obtained a Russian-English dictionary of lexical expressions, which can be used both as a ready-made dictionary and as a raw resource for manual dictionary construction. The resulting Russian-English phrase dictionary was placed on the Internet as a linguistic resource.

Author Biographies

Albina M. Khusainova, Innopolis University

4th year post-graduate student, assistant in Machine Learning and Knowledge Representation Laboratory, Innopolis University
Vitaly A. Romanov, Innopolis University

4th year post-graduate student, assistant in Industrial Software Production Laboratory, Innopolis University
Adil M. Khan, Innopolis University

Candidate of Science in Physics and Mathematics, Professor, Head of the Machine Learning and Knowledge Representation Laboratory, Innopolis University