Formation of user agreements corpus in Russian
DOI:
https://doi.org/10.17308/sait/1995-5499/2024/3/138-152Keywords:
text corpus, corpus formation, user agreement, personal data, text corpus cleaningAbstract
Currently, the collection and processing of personal data is widely used in the provision of digital services on the Internet. Increasing cases of sharing of personal data by operators, in particular by Yandex, Gemotest, etc., directly relate to the personal data of users, and most often are undesirable for them. The trend towards an increase in the number of such cases occurs in current realities, however, users continue to pay insufficient attention to user agreements, as shown in current studies. This happens because user agreements are written in a language that is difficult to understand and are often quite lengthy. One possible solution to this problem is to improve the readability of user agreements by using decision support tools that present user agreements in an easier-to-understand manner. However, the development of such tools requires the use of a large amount of data necessary for training the corresponding models. The required corpora of user agreements exist, but it is worth noting that they are all dedicated to user agreements written in English, while corpora of user agreements in Russian do not currently exist. The lack of development of this problem motivated the author to develop a method for generating a corpus of user agreements in Russian, methods for cleaning it, as well as tools that implement these methods. As a result of the work of the developed tools, a corpus of Russian-language user agreements was obtained, consisting of 7510 sanitized Russian-language user agreements. In addition, the work presents the results of a statistical analysis of the corpus, which clarifies some of the features of the corpus of user agreements, which can be used in further research aimed at increasing the transparency of user agreements for end users.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













