Многоязычный машинный перевод с помощью иерархического трансформера

Альбина Маратовна Хусаинова; Виталий Анатольевич Романов; Адил Мехмуд Хан

doi:10.17308/sait.2022.1/9207

Authors

Albina M. Khusainova Innopolis University https://orcid.org/0000-0002-0636-3449 (unauthenticated)
Vitaly A. Romanov Innopolis University https://orcid.org/0000-0003-3772-0039 (unauthenticated)
Adil M. Khan Innopolis University https://orcid.org/0000-0003-2220-8518 (unauthenticated)

DOI:

https://doi.org/10.17308/sait.2022.1/9207

Keywords:

neural machine translation, multilingual translation, parameter organization, language trees, hierarchical architecture, low-resource translation, related languages

Abstract

The way parameters are organized in multilingual machine translation models defines the effectiveness of parameter space usage. Therefore, it directly influences the translation quality. This work explores the idea of using language trees as the basis for the multilingual machine translation models architecture. Language trees show how different languages are related to each other and the primary idea is to organize multilingual models according to these expert hierarchies: the more related two languages are, the more parameters they share. We test this approach for the Transformer architecture and demonstrate that despite the success in previous works there are persistent problems inherent to training hierarchical models. We investigate it and propose a solution to this problem and show that with the suggested training fix the hierarchical model can considerably outperform both bilingual and multilingual models with full parameter sharing.

Author Biographies

Albina M. Khusainova, Innopolis University

4th year post-graduate student, assistant in Machine Learning and Knowledge Representation Laboratory, Innopolis University
Vitaly A. Romanov, Innopolis University

4th year post-graduate student, assistant in Industrial Software Production Laboratory, Innopolis University
Adil M. Khan, Innopolis University

Candidate of Science in Physics and Mathematics, Professor, Head of the Machine Learning and Knowledge Representation Laboratory, Innopolis University