Применение корпуса текстов для автоматической классификации в комплексе инструментов автоматизированного анализа текстов
DOI:
https://doi.org/10.17308/sait.2018.2/1224Keywords:
automated text analysis tools, corpus of texts, classifier trainingAbstract
One of the urgent tasks of computer linguistics is automatic classification of texts. This task is solved in the complex of tools for automated text analysis developed by authors too. To train the classifier on a large set of subject areas, it is necessary to automate this process, which requires the presence of a marked textual corpus. The article describes the creation of the corpus of texts with extensible markup and an application for working with it, which allows creating subcorpora according to a custom set of characteristics. This allows using the corpus both for machine-learning methods training during solving tasks of text analysis, and for automating the verification of results of various methods of computer linguistics.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













