Possible algorithm for calculating the limit size of a writer’s dictionary

Authors

DOI:

https://doi.org/10.17308/sait.2021.1/3378

Keywords:

lexical diversity rate, Zipf ’s law, extrapolation, lemmatized frequency dictionary, limit size of the glossary

Abstract

The article is proposed a method for estimating the maximum size of a writer’s dictionary by extrapolating an empirically defined function expressing the dependence of the lexical diversity rate on the size of the text corpus. The problems of the approximation of the chosen extrapolation method are discussed. Calculations were made on the example of Leo Tolstoy using the logarithmic basis functions for approximation and extrapolation.

Author Biographies

  • Alexey A. Kretov, Voronezh State University

    doctor of philology, professor of the department of theoretical and applied linguistics of Voronezh State University

  • Мария Викторовна Ломец, Voronezh State University

    student of the department of theoretical and applied linguistics, faculty of Romano-Germanic philology, Voronezh State University

  • Igor P. Polovinkin, Voronezh State University

    doctor of physical and mathematical sciences, professor of the department of mathematical and applied analysis, docent of the department of theoretical and applied linguistics of Voronezh State University

References

Downloads

Published

2021-04-29

Issue

Section

Computer Linguistics and Natural Language Processing

How to Cite

Possible algorithm for calculating the limit size of a writer’s dictionary. (2021). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 1, 133-145. https://doi.org/10.17308/sait.2021.1/3378