Применение корпуса текстов для автоматической классификации в комплексе инструментов автоматизированного анализа текстов

Authors

  • Сергей Александрович Полицын Moscow Aviation Institute (National Research University)
  • Екатерина Валерьевна Полицына Moscow Aviation Institute (National Research University)

DOI:

https://doi.org/10.17308/sait.2018.2/1224

Keywords:

automated text analysis tools, corpus of texts, classifier training

Abstract

One of the urgent tasks of computer linguistics is automatic classification of texts. This task is solved in the complex of tools for automated text analysis developed by authors too. To train the classifier on a large set of subject areas, it is necessary to automate this process, which requires the presence of a marked textual corpus. The article describes the creation of the corpus of texts with extensible markup and an application for working with it, which allows creating subcorpora according to a custom set of characteristics. This allows using the corpus both for machine-learning methods training during solving tasks of text analysis, and for automating the verification of results of various methods of computer linguistics.

Author Biographies

  • Сергей Александрович Полицын, Moscow Aviation Institute (National Research University)

    senior lecturer, department of «Design of Computing Systems», Moscow Aviation Institute (National Research University).

  • Екатерина Валерьевна Полицына, Moscow Aviation Institute (National Research University)

    candidate of technical sciences, associate professor, department of «Design of Computing Systems», Moscow Aviation Institute (National Research University).

References

Downloads

Published

2018-01-29

Issue

Section

Computer Linguistics and Natural Language Processing

How to Cite

Применение корпуса текстов для автоматической классификации в комплексе инструментов автоматизированного анализа текстов. (2018). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 2, 162-167. https://doi.org/10.17308/sait.2018.2/1224

Most read articles by the same author(s)