Development and evaluation of a rag system for analysis of semantic relations of specialised texts

Authors

DOI:

https://doi.org/10.17308/sait/1995-5499/2025/2/114-126

Keywords:

RAG (Retrieval Augmented Generation), large language models (LLM), vector databases, semantic similarity, quantization, Milvus

Abstract

The article presents the development and evaluation of a system for automatic determination of semantic similarity of texts within specialized subject areas based on the RAG (Retrieval Augmented Generation) method. The proposed solution integrates large language models (Llama, Gemma) with vector databases (Milvus) to extract context-relevant information. The main task of the system is binary classification of pairs of text descriptions («target indicator — event») in order to identify their semantic connection. The experimental part includes a comparative analysis of the effectiveness of various configurations of models and prompts. The best result was demonstrated by the Llama-3.1-8B-Instruct model, which achieved an F1-measure of 0.9 when using the RAG approach. An important element of the system was the optimization of the generative part of the models due to 4-bit quantization, which made it possible to reduce computational costs without significant loss of accuracy. It is shown that inclusion of relevant examples from the RAG system vector database into prompts increases classification accuracy by 15–20 % compared to the basic settings. The results confirm that the RAG system effectively adapts to highly specialized texts, such as regulatory documents or strategic plans. The practical significance of the work lies in the possibility of implementing the system in organizational event management processes, where checking the consistency of formulations and identifying contradictions is critical. Prospects for further research are related to the integration of modern language models (Qwen, DeepSeek), optimization of indexing methods in vector storages and improvement of prompt templates.

Author Biographies

  • Irina L. Kashirina, Russian Technological University MIREA, Voronezh State University

    Doctor of Engineering Sciences, Professor, Professor of the Department of Mathematical Methods of Operations Research of the Voronezh State University; Professor of the Department of Artificial Intelligence Technologies of the Russian Technical University MIREA

  • Ilya R. Osipov, Voronezh State University

    Postgraduate Student of the Department of Mathematical Methods of the Voronezh State University

  • Vyacheslav A. Yakovlev, Russian Technological University MIREA

    Postgraduate Student of the Department of Artificial Intelligence Technologies of the Russian Technical University MIREA

References

Published

2025-09-02

Issue

Section

Modern Technologies of Software Development

How to Cite

Development and evaluation of a rag system for analysis of semantic relations of specialised texts. (2025). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 2, 114-126. https://doi.org/10.17308/sait/1995-5499/2025/2/114-126

Most read articles by the same author(s)