Choice of variant in case of fuzzy string comparison
DOI:
https://doi.org/10.17308/sait/1995-5499/2023/2/181-191Keywords:
fuzzy line comparison, Levenstein’s editorial distance, Damerau — Levenstein’s editorial rank, Jaro — Winkler distance, Hamming distance, correction of typographical errors, correction of spelling errorsAbstract
The article considers the actual problem of correction of typographical errors and spelling errors when analyzing comments in social and corporate networks. The fuzzy string comparison algorithms, namely algorithms for finding Levenstein’s, Dameru-Levenstein’s, Jaro-Winkler’s and Hemming’s editorial distance, were selected as the object of research. The speed of the methods is compared and their algorithmic complexity is evaluated. The method of contextually independent selection of variant from set of solutions is offered with fuzzy comparison of rows. Hypotheses have been formed, the relevance of which has been proved as a result of execution of a computational experiment. As an assessment of the proposed linear algorithm, the accuracy metric is given. The theoretical significance of the study lies in the assessment of the quality of existing algorithms of fuzzy line comparison and hypothesis in order to develop the algorithm of correction of typographical and spelling errors in the text. The practical significance lies in the software implementation of the algorithm of correction of typographical errors and spelling errors in the text, as well as in carrying out a computational experiment with obtaining a dictionary of frequency of substitution of characters. The novelty of the result lies in the development of an algorithm for solving the problem of correcting typographical and spelling errors, which differs in the quality of the work.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













