Семантический анализ и синтез текстовых данных

  • Оксана Игоревна Захарова Поволжский государственный университет телекоммуникаций и информатики https://orcid.org/0000-0003-3371-4344
Ключевые слова: семантический анализ, синтез данных, автоматическая обработка текста, анализ разнородных данных

Аннотация

Данная статья носит обзорный характер. Изучение представлений отечественных и зарубежных исследователей имеет важное значение, что определено современными требованиями к изучению систем обработки данных. Цель — попытаться определить в чем может состоять машинное понимание текста/речи. Кроме того, такие LLM как ChatGPT подчеркивают важность и своевременность такого обзора. С другой стороны, несмотря на ежедневный прирост общемирового объема данных, их использование в необработанном (сыром) виде как правило не представляется возможным. Для решения ряда прикладных задач требуется в той или иной мере произвести их обработку. Решение прикладных задач обработки естественного языка невозможно без использования методов семантического анализа и синтеза данных. Возрастающие объемы генерируемой пользователями информации и цифровизация общества требуют совершенствования указанных методов, что обуславливает актуальность обзора на данную тему. Целью работы является рассмотрение основных трендов в области обработки естественного языка, использования семантического анализа, онтологий и синтеза данных. Описана суть семантического анализа, его применения и существующие подходы к реализации как традиционными способами, так и с применением методов искусственного интеллекта. Определены основные преимущества использования семантического анализа при работе с данными. В основе работы лежит метод анализа и обработки данных, так, был проведен обзор подходов к классификации текстов в информационных системах. Рассмотрены вопросы обеспечения доступа к обобщенной информации из различных баз данных с помощью семантического подхода и онтологии данных. Описаны варианта синтеза данных как из наборов структурированных данных, так и с использованием метаданных. В результате проведенного исследования выделены основные проблемы при обработке естественного языка такие, как доступ к данным, открытость данных исследований, определение тональности, иронии и сарказма. Представленная информация может быть использована при планировании решения задач обработки естественного языка, разработке программных продуктов для автоматизации данного процесса, разработке реляционных баз данных, систем поддержки принятия решений, информационных и аналитических систем.

Скачивания

Данные скачивания пока не доступны.

Биография автора

Оксана Игоревна Захарова, Поволжский государственный университет телекоммуникаций и информатики

канд. техн. наук, доцент, доцент кафедры Информационные системы и технологии Поволжского государственного университета телекоммуникаций и информатики

Литература

1. Bonawitz K. (2017) Practical Secure Machine Learning Aggregation with Privacy Preservation. URL
2. Zhao Y., Li M., Lai L., Suda N., Civin D. and Chandra V. (2018) Federated Learning with NonIID. Data. arXiv:cs.LG/1806.00582
3. Yang Q., Liu Ya.. Chen T., Tong Yu. (2019) Federated machine learning: concept and applications. URL
4. Aletdinova A. A., Kapelyuk Z. A., Koritsky A. V. (2022) Analysis of purchasing behavior of the population based on semantic analysis of user requests. Bulletin of the Altai Academy of Economics and Law. No 2-1. P. 11–16. DOI
5. Kozlov A. S., Zazhogin S. D., Kabarukhin A. P. and Angapov V. D. (2021) Application of a semantic approach to organizing metadata. Science, technology and education. № 8(83). P. 42– 49. (In Russian)
6. Stroy O. A. (2019) Basic methods of semantic analysis. The youth. Society. Modern science, technology and innovation. No 18. P. 82–83.
7. Bagaev I. V., Vasenina M. S. and Kudryavtsev P. A. (2017) Clustering and semantic analysis of posts in social networks. Society. The science. Innovation: collection of articles. All-Russian annual scientific and practical conference, Vyatka State University. Kirov. P. 670–675. (In Russian)
8. Gavrilenko A. V. (2022) Application of semantic text analysis methods and machine learning methods for sentiment analysis of financial news messages. MNSK-2022: Materials of the 60th International Scientific Student Conference, Novosibirsk, April 10–20, 2022. Novosibirsk : Novosibirsk National Research State University. P. 141. (In Russian)
9. Gorokhov D. B. and Karnaukhov. A. S. (2019) Transformation of the semantic network based on automated text analysis. Proceedings of the Bratsk State University. Series: Natural and engineering sciences. T. 1. P. 31–35. (In Russian)
10. Anoshin P. I. (2017) Automatic text analysis. Syntactic and semantic analysis. Eurasian scientific journal. No 6. P. 211–213. (In Russian)
11. Kanev A. I. (2020) Metagraph approach for text mining as a promising direction for semantic search. Dynamics of complex systems –XXI century. T. 14, No 3. P. 44–56. – DOI
12. Barakhnin V. B. and Kozhemyakina O. Yu. (2016) On the problem of authenticity of phonetic analysis. Bulletin of Tomsk State University. Philology. No 2. P. 5–28. DOI
13. Dikovitsky V. V. (2017) Semantic text analysis using neural network analysis of morphology and syntax. Proceedings of the Kola Scientific Center of the Russian Academy of Sciences. T. 8, No 3-8. P. 109–115. (In Russian)
14. Katanov Yu. E. (2020) Analysis and synthesis of information systems (processing of het- erogeneous data, geology): Textbook – Tyumen : Tyumen Industrial University. P. 159. (In Russian)
15. Voronin V. M. [et al.] (2017) The use of latent semantic analysis as an alternative to propositional analysis in text comprehension studies. Humanization of education. No 2. P. 11–19. (In Russian)
16. Ogiela L. (2017) Cognitive Information Systems in Management Sciences. Intelligent Data-Centric Systems. P. 11–23.
17. Konyukhova O. V. and Kravtsova E. A. (2018) Compiler from the language of task analysis to the language of user actions: semantic analysis. Information technologies in science, education and production (ITNOP-2018): VII International Scientific and Technical Conference. Belgorod. P. 89–94. (In Russian)
18. Voronin V. V. and Morozov A. V. (2021) Technology for identifying key features in sequences of API calls of malicious programs. Data analysis and processing systems. No 3(83). P. 37– 52. DOI
19. Tlebaldinova A. S., Karmenova M. A. and Maulit A. (2022) Comparative analysis of semantic segmentation models of agricultural plants images. Bulletin of Almaty University of Energy and Communications. № 2. P. 82–91. DOI
20. Shumski S. (2017) Brain and language: Hypotheses about structure of a natural language.
21. Kozko A. A. (2017) Methods and algorithms for structural-semantic analysis of Internet forums to improve the quality of text information search. Information technologies for intelligent decision support (ITIDS’2017): Proceedings of the V All-Russian Conference (with the invitation of foreign scientists). Ufa, May 16–19, 2017. Ufa : State Educational Institution of Higher Professional Education “Ufa State Aviation Technical University”. P. 27–30. (In Russian)
22. Dzhumabaeva Zh. Sh. and Alautdinova Zh. Sh. (2021) Semantic analysis of terms in the field of logistics. Uzbekistonda Khorizhiy Tillar. No 4(39). P. 28–41. DOI
23. Kuznetsov A. V. (2020) Computer analysis of texts in Latin: Latent-semantic analysis of “History of the Goths, Vandals and Sueves” by Isidore of Seville. Historical information science. No 2(32). P. 178-191. DOI
24. Manyashin A. V. (2022) Processing online vehicle monitoring data to synthesize typical driving cycles. International scientific research journal. № 10(124). DOI
25. Averkin A. and Yarushev S. (2020) Neural networks in semantic analysis. Open semantic technologies for the design of intelligent systems. № 4. P. 133–136. (In Russian)
26. Kobrinsky B. A. and Yankovskaya A. E. (2020) The problem of convergence of intelligent systems and their immersion in information systems with cognitive decision-making. Open semantic technologies for designing intelligent systems. No 4. P. 117–122. (In Russian)
27. Shirinkina E. V. and Korolenko V. V. (2018) Development of human resources at the stage of development of digital technologies: semantic analysis. Economics: yesterday, today, tomorrow. T. 8, No 10A. P. 279–287. (In Russian)
28. Kocharov D. A. (2017) Application of linguistic features for automatic determination of intonationally emphasized words in Russian-language text. Proceedings of SPIIRAN. No 6(55). P. 216–236. DOI
29. Shashkova V. N. (2020) Analysis of the Language Component of Nominative Units in the Semantic Field “Road Patrol Service”. Russian Linguistic Bulletin. № 1(21). DOI
30. Sirotyuk V. O. (2020) Formalized methodology for analysis and synthesis of optimal structures of thematic patent databases. Managing the development of large-scale systems mlsd’2020: Proceedings of the thirteenth international conference. Moscow, September 28–30, 2020. Under the general editorship of S. N. Vasilyeva, A. D. Tsvirkuna. Moscow : Institute of Management Problems named after. V. A. Trapeznikov RAS. P. 1578–1588. DOI
31. Usachev S. S. and Ententeev A. R. (2018) Using modeling systems (MATLAB, PSCAD, RSCAD, Powerfactory) to synthesize data for machine learning problems in the electric power industry. Radioelectronics, electrical engineering and energy: Abstracts of reports, Moscow, March 15–16, 2018. Moscow : Limited Liability Company “Center for Printing Services “RADUGA”. P. 590. (In Russian)
32. Fischer M., Heim D., Hofmann A., Janisch S., Klima & Winkelmann A. (2020) Taxonomy and archetypes of intellectual services for intellectual life. Electronic Markets. № 30(1). P.131–149
33. Miloslavsky E. S. (2018) Semantic analysis of the text as a means of better understanding the meaning. Intelligent technologies and means of rehabilitation and habilitation of people with disabilities (ITSR-2018): Proceedings of the III international conference. Moscow, November 29–30, 2018. Moscow : Moscow State University of Humanities and Economics. P. 255–258. (In Russian)
34. Rychagov S. A. (2017) Using latent semantic analysis for automatic text classification. International Journal of Information Technology and Energy Efficiency. No 2(4). P. 28–33. (In Russian)
35. Shevnina Yu. S. and Tomishinets A. M. (2021) Automation of semantic analysis of information in natural language in search results. XXI century: results of the past and problems of the present plus. No 4(56). P. 35–38. DOI
36. Oleynikov A. P. and Oliseenko V. D. (2020) Synthesis of a multi-social network database. Regional Informatics (RI-2020): XVII St. Petersburg International Conference. Conference materials.St. Petersburg, October 28–30, 2020. St. Petersburg : Regional public organization “St. Petersburg Society of Informatics, Computer Technology, Communication and Control Systems”. P. 257– 258. (In Russian)
37. Davydova N. V. and Dekusar G. G. (2020) Semantic analysis of compound words and their transformations as an important component in foreign language competence formation. Scientific review. № 3(66). P. 148–162.
38. Pevneva A. G. and Kulikova A. S. (2019) Data analysis, mathematical modeling and optimal synthesis: a unified complex of scientific research. Problems of increasing the efficiency of scientific work in the military-industrial complex of Russia: Materials of the 2nd All-Russian Scientific and Practical Conference. Znamensk, April 11–12, 2019. Znamensk : Astrakhan State University, Publishing House “Astrakhan University”. P. 225–231. (In Russian)
39. Ustalov D. A. and Sozykin A. V. (2017) A set of programs for automatically constructing a semantic network of words. Bulletin of the South Ural State University. Series: Computational mathematics and computer science. No 2. P. 69– 83. DOI
40. Dominey P. F., Inui T. and Hoen M. (2009) Neural network processing of natural language: II. towards a unified model of corticostriatal function in learning sentence comprehension and non-linguistic sequencing. Brain and Language. 109(2-3). P. 80–92. DOI
41. Ryabushev S. A. and Babushkin D. A. (2022) Research and analysis of various machine learning methods to solve the problem of semantic analysis. Science and Enlightenment: collection of articles. XXI international scientific and practical conference. Penza. P. 131–136. (In Russian)
42. Han X. and Kwoh C. K. (2019) Natural Language Processing Approaches in Bioinformatics. Encyclopedia of Bioinformatics and Computational Biology. № 1. P. 561–574.
43. Espinal-Enríquez J., Mejía-Pedroza R. A. and Hernández-Lemus E. (2017) Computational Approaches in Precision Medicine. Progress and Challenges in Precision Medicine. P. 233–250.
44. Heinz M. V., Thomas N. X., Nguyen N. D., Griffin T. Z. and Jacobson N. C. (2022) Technological Advances in Clinical Assessment. Comprehensive Clinical Psychology (Second Edition). № 4. P. 301–320.
45. Iskra N., Iskra V. and Lukashevich M. (2020) Principles of decision-making systems construction based on semantic image analysis. Open semantic technologies for designing intelligent systems. № 4. P. 189–196.
46. Raskatova M. V. and Chelyshev E. A. (2022) Information system for automatic categorization of news texts using machine learning. Materials of the IV International Scientific and Practical Conference. Nizhnevartovsk, December 08, 2021. resp. ed. T. B. Kaziakhmedov. Nizhnevartovsk : Nizhnevartovsk State University. P. 283–288. DOI
47. Cavaliere D., Senatore S. and Loia V. (2019) Data-Information-Concept Continuum From a Text Mining Perspective. Encyclopedia of Bioinformatics and Computational Biology. № 1. P. 586–601.
48. Vasilyeva E. A. (2018) Development of a technical vision system for semantic image analysis. Abstracts of the presentations of the Second Youth Conference “Innovative activities in science and technology. Electromechanics, automation and robotics”. Istra, April 26, 2018. Istra : Scientific Research Institute of Electromechanics. P. 17– 21. (In Russian)
49. Mosalov O. P., Ivanov I. A. and Pershin M. A. (2021) Application of clustering and machine learning methods to build a recommendation system to determine the relevance of scientific publications. Information technology bulletin. No 4(30). P. 89–102. (In Russian)
50. Shimokhin A. V. (2021) Semantic analysis of reviews about suppliers based on the use of neural network technology. Basic research. No 5. P. 117–121. DOI
51. Lam S. L. I. and Lee D. L. (1999) Feature reduction for text categorization based on a neural network. Proceedings. 6th International Conference on Advanced Systems for Advanced Applications. IEEE. P. 195–202. URL
52. Rao D. R., and Prasanna P. L. (2018) Text classification using artificial neural networks. International Journal of Engine ering & Technology. №7(1). P. 603–606. DOI
53. Radhika K., Bindu K. R. and Latha P. A. (2018) Text Classification Model Using Convolution Neural Network and Recurrent Neural Network. International Journal of Pure and Applied Mathematics. №15. P. 1549–1554.
54. Shirobokova S. N. and Serikov O. N. (2019) Project of an information and analytical system based on the LSA semantic text analysis method. Bulletin of youth science of Russia. No 1. P. 55–63.
55. Bastin A. M. (2020) Text Classification using Neural Networks. Machine Learning. № 9. P. 30–37.
56. ElGhazaly T. (2018) Automatic Text Classification Using Neural Network and Statistical Approaches. Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence. Vol 740. Springer, Cham. DOI
57. Aleroud A. and Zhou L. (2017) Phishing environments, techniques, and countermeasures: A survey. Computers & Security. № 68. P. 160–196.
58. Binks A. (2019) The art of phishing: past, present and future. Computer Fraud & Security. № 4. P. 9–11.
59. Yang L., Cormican K. and Yu M. (2019) Ontology-based systems engineering: A stateof-the-art review. Computers in Industry. № 111. P. 148–171.
60. Hamid R. A., Albahri A. S., Alwan J. K. and Al-qaysi Z. T. (2021). How smart is e-tourism? A systematic review of smart tourism recommendation system applying data management. Computer Science Review. № 39. P. 10–33
61. Vasiliev V. I., Vulfin A. M., Kirillova A. D. and Nikonov A. V. (2021) A system for assessing vulnerability risk metrics based on semantic data analysis technologies. Bulletin of the Urals Federal District. Security in the information sphere. No 2. P. 31–43. DOI
62. Marat D. N. and Musiralieva S. Z. (2021) Classifying criminogenic data of web resources by using Semantic Analysis Methods. Bulletin of Almaty University of Energy and Communications. No 1(52). P. 107–119. DOI
63. Selenina A. L. (2020) Methodology for analyzing the semantic aspects of personal information security. Bulletin of science and education. No. 1-2(79). P. 52–56. (In Russian)
64. Shelukhin O. I., Vanyushina A. V. and Zhelnov M. S. (2022) Using latent semantic analysis in preparing data to identify anonymous users using digital fingerprints. High technology in space exploration of the Earth. T. 14, No 1. P. 36– 44. DOI
65. Potaraev V. (2020) Analysis of relation types in semantic network used for text classification. Open semantic technologies for designing intelligent systems. No 4. P. 305–308.
66. Kishenova A. Yu., Tazhibaeva S. M. and Kabanova A. B. (2021) Cognitive-semantic approach to text analysis in the Russian language picture of the world. National Association of Scientists. No 36-4(63). P. 54–56.
67. Chernenko O. and Gordeeva O. (2017) Semantic Analysis of Text Data with Automated System. International conference “Information Technology and Nanotechnology 2017”. P. 72–76.
68. Wang B., Yin W. and Lin V. (2021) Learning to Synthesize Data for Semantic Parsing. URL
69. Sanfilippo A., Tratz S., Gregory M., Chappell A. ets. Ontological Annotation with WordNet. URL
Опубликован
2024-02-05
Как цитировать
Захарова, О. И. (2024). Семантический анализ и синтез текстовых данных. Вестник ВГУ. Серия: Системный анализ и информационные технологии, (4), 182-208. https://doi.org/10.17308/sait/1995-5499/2023/4/182-208
Раздел
Компьютерная лингвистика и обработка естественного языка