Automatic bilingual phrase dictionary construction from GIZA++ output
DOI:
https://doi.org/10.17308/sait/1995-5499/2022/4/189-201Keywords:
phrase translation, collocation translation, construction, bilingual dictionary, phrase dictionary, machine translation, automatic dictionary language resourcesAbstract
Modern encoder-decoder based neural machine translation (NMT) models are normally trained on parallel sentences. Hence, they give best results when translating full sentences rather than sentence parts. Thereby, the task of translating commonly used phrases, which often arises for language learners, is not addressed by NMT models. While for high-resourced language pairs human-built phrase dictionaries exist, less-resourced pairs do not have them. In this paper, we propose an automatic approach to create such a dictionary based on the output of the statistical tool GIZA++ followed by filtering with heuristics. We analyze the translation quality obtained with this approach and compare it with reference translations and with phrases translation using a sentences-trained NMT system. The results show that, despite the problems identified, the phrase translations are most often correct, and even if they do not match the reference translation, they represent valid alternative translations. Another important result is that this approach works significantly better than the phrase translation using the NMT system. Using the proposed approach, we obtained a Russian-English dictionary of lexical expressions, which can be used both as a ready-made dictionary and as a raw resource for manual dictionary construction. The resulting Russian-English phrase dictionary was placed on the Internet as a linguistic resource.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













