Choice of variant in case of fuzzy string comparison

Authors

DOI:

https://doi.org/10.17308/sait/1995-5499/2023/2/181-191

Keywords:

fuzzy line comparison, Levenstein’s editorial distance, Damerau — Levenstein’s editorial rank, Jaro — Winkler distance, Hamming distance, correction of typographical errors, correction of spelling errors

Abstract

The article considers the actual problem of correction of typographical errors and spelling errors when analyzing comments in social and corporate networks. The fuzzy string comparison algorithms, namely algorithms for finding Levenstein’s, Dameru-Levenstein’s, Jaro-Winkler’s and Hemming’s editorial distance, were selected as the object of research. The speed of the methods is compared and their algorithmic complexity is evaluated. The method of contextually independent selection of variant from set of solutions is offered with fuzzy comparison of rows. Hypotheses have been formed, the relevance of which has been proved as a result of execution of a computational experiment. As an assessment of the proposed linear algorithm, the accuracy metric is given. The theoretical significance of the study lies in the assessment of the quality of existing algorithms of fuzzy line comparison and hypothesis in order to develop the algorithm of correction of typographical and spelling errors in the text. The practical significance lies in the software implementation of the algorithm of correction of typographical errors and spelling errors in the text, as well as in carrying out a computational experiment with obtaining a dictionary of frequency of substitution of characters. The novelty of the result lies in the development of an algorithm for solving the problem of correcting typographical and spelling errors, which differs in the quality of the work.

Author Biographies

  • Irina E. Voronina, Voronezh State University

    DSc in Technical Sciences, Professor of Software Development and Information Systems Administration Department of Applied Mathematics, Informatics and Mechanics Faculty

  • Nikita A. Ekert, Voronezh State University

    postgraduate student, Department of Software and Information Systems Administration, Voronezh State University

References

Downloads

Published

2023-09-29

Issue

Section

Computer Linguistics and Natural Language Processing

How to Cite

Choice of variant in case of fuzzy string comparison. (2023). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 2, 181-191. https://doi.org/10.17308/sait/1995-5499/2023/2/181-191

Most read articles by the same author(s)