Multilingual machine translation using hierarchical transformer

Authors

DOI:

https://doi.org/10.17308/sait.2022.1/9207

Keywords:

neural machine translation, multilingual translation, parameter organization, language trees, hierarchical architecture, low-resource translation, related languages

Abstract

The way parameters are organized in multilingual machine translation models defines the effectiveness of parameter space usage. Therefore, it directly influences the translation quality. This work explores the idea of using language trees as the basis for the multilingual machine translation models architecture. Language trees show how different languages are related to each other and the primary idea is to organize multilingual models according to these expert hierarchies: the more related two languages are, the more parameters they share. We test this approach for the Transformer architecture and demonstrate that despite the success in previous works there are persistent problems inherent to training hierarchical models. We investigate it and propose a solution to this problem and show that with the suggested training fix the hierarchical model can considerably outperform both bilingual and multilingual models with full parameter sharing.

Author Biographies

  • Albina M. Khusainova, Innopolis University

    4th year post-graduate student, assistant in Machine Learning and Knowledge Representation Laboratory, Innopolis University

  • Vitaly A. Romanov, Innopolis University

    4th year post-graduate student, assistant in Industrial Software Production Laboratory, Innopolis University

  • Adil M. Khan, Innopolis University

    Candidate of Science in Physics and Mathematics, Professor, Head of the Machine Learning and Knowledge Representation Laboratory, Innopolis University

References

Downloads

Published

2022-04-26

Issue

Section

Computer Linguistics and Natural Language Processing

How to Cite

Multilingual machine translation using hierarchical transformer. (2022). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 1, 125-138. https://doi.org/10.17308/sait.2022.1/9207

Most read articles by the same author(s)