Многоязычный машинный перевод с помощью иерархического трансформера

Альбина Маратовна Хусаинова; Виталий Анатольевич Романов; Адил Мехмуд Хан

doi:10.17308/sait.2022.1/9207

Альбина Маратовна Хусаинова Университет Иннополис https://orcid.org/0000-0002-0636-3449
Виталий Анатольевич Романов Университет Иннополис https://orcid.org/0000-0003-3772-0039
Адил Мехмуд Хан Университет Иннополис https://orcid.org/0000-0003-2220-8518

DOI: https://doi.org/10.17308/sait.2022.1/9207

Ключевые слова: нейронный машинный перевод, многоязычный перевод, организация параметров, языковые деревья, иерархическая архитектура, низкоресурсный перевод, родственные языки

Аннотация

Выбор стратегии распределения параметров между языками в моделях многоязычного машинного перевода определяет то, насколько оптимально используется пространство параметров. Следовательно, выбранная стратегия напрямую влияет на конечное качество перевода. Данная работа исследует новый подход к организации параметров в многоязычном машинном переводе на основе лингвистических деревьев, которые показывают степень родства между различными языками. Основная идея заключается в том, чтобы использовать эти экспертные языковые иерархии в качестве основы для архитектуры модели: чем ближе два языка, тем больше у них должно быть общих параметров. Мы испытываем эту идею для архитектуры Трансформер и показываем, что, несмотря на успех в предыдущих работах, существуют проблемы, присущие обучению таких иерархических моделей. Мы демонстрируем, что при специально подобранной стратегии обучения иерархическая архитектура может превзойти как простые двуязычные модели, так и многоязычные модели перевода с общим пространством параметров.

Скачивания

Биографии авторов

Альбина Маратовна Хусаинова, Университет Иннополис

аспирант 4-го года обучения, ассистент в лаборатории машинного обучения и представления данных Университета Иннополис

Виталий Анатольевич Романов, Университет Иннополис

аспирант 4-го года обучения, ассистент в лаборатории промышленной разработки ПО Университета Иннополис

Адил Мехмуд Хан, Университет Иннополис

канд. физ.-мат. наук, профессор, начальник лаборатории машинного обучения и представления данных Университета Иннополис

Литература

1. Tan X., Chen J., He D., Xia Y., Qin T. and Liu T. Y. (2019) Multilingual Neural Machine Translation with Language Clustering. In EMNLP/IJCNLP.
2. Johnson M., Schuster M., Le Q. V., Krikun M., Wu Y., Chen Z. [et al.] (2017) Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of
the Association for Computational Linguistics. 5. P. 339–351.
3. Dong D., Wu H., He W., Yu D. and Wang H. (2015 ) Multi-Task Learning for Multiple Language Translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers); 2015 Jul; Beijing: Association for Computational Linguistics. P. 1723–1732.
4. Azpiazu I. M. and Pera M. S. (2020) A Framework for Hierarchical Multilingual Machine Translation.
5. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N. [et al.] (2017) Attention is All you Need. In Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S. [et al.], editors. Advances in Neural Information Processing Systems 30.: Curran Associates, Inc. P. 5998–6008.
6. Firat O., Cho K. and Bengio Y. (2016 ) Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2016 Jun; San: Association for Computational Linguistics. P. 866–875.
7. Wang Y., Zhou L., Zhang J., Zhai F., Xu J. and Zong C. (2019) A Compact and Language-Sensitive Multilingual Translation Method. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul; Florence: Association for Computational Linguistics. P. 1213–1223.
8. Sachan D. and Neubig G. (2018) Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In Proceedings of the Third Conference on Machine Translation: Research Papers; 2018 Oct; Brussels: Association for Computational Linguistics. P. 261–271.
9. Bapna A., Arivazhagan N. and Firat O. (2019) Simple, Scalable Adaptation for Neural Machine Translation. In EMNLP/IJCNLP.
10. Fan A., Bhosale S., Schwenk H., Ma Z., El-Kishky A., Goyal S. [et al.] (2020) Beyond English-Centric Multilingual Machine Translation. ArXiv. 2020; abs/2010.11125.
11. Schleicher A. and Schleicher S. Die ersten Spaltungen des indogermanischen Urvolkes [The first splits of the Proto-Indo-European people]. 1853.
12. Belinkov Y., Màrquez L., Sajjad H., Durrani N., Dalvi F. and Glass J. (2017) Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging
Tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Vol. 1: Long Papers); 2017 Nov; Taipei: Asian Federation of Natural Language Processing. P. 1–10.
13. Kudugunta S., Bapna A., Caswell I. and Firat O. (2019) Investigating Multilingual NMT Representations at Scale. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); 2019 Nov; Hong: Association for Computational Linguistics. P. 1565–1575.
14. Savelyev A. and Robbeets M. (2020) Bayesian phylolinguistics infers the internal structure and the time-depth of the Turkic language family. Journal of Language Evolution. 2020 Feb.
15. Agić Ž and Vulić I. (2019) JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul; Florence: Association for Computational Linguistics. P. 3204–3210.
16. Sennrich R., Haddow B. and Birch A. (2016) Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers; 2016: The Association for Computer Linguistics.
17. Tiedemann J. (2012) Parallel Data, Tools and Interfaces in OPUS. In Chair) NC(, Choukri K, Declerck T, Dogan MU, Maegaard B, Mariani J, et al., editors. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12); 2012 May; Istanbul: European Language Resources Association (ELRA).
18. Phillips A & Davis M. (2009) Tags for Identifying Languages. 2009 Sep.