The impact of data preparation techniques on house price prediction task

Авторы

DOI:

https://doi.org/10.17308/sait/1995-5499/2025/1/133-142

Ключевые слова:

real estate price prediction, Feature engineering, dimentionality reduction, Pca, autoencoders, One-Hot encoding, handling outliers, Target encoding

Аннотация

Accurate house price prediction is considered critical for decision-making in the real estate sector, where datasets are often characterized by missing values, outliers, and skewed distributions. In this study, the impact of various data preprocessing techniques on the performance of the XGBoost algorithm for predicting house prices is investigated. A real estate dataset from Kaggle is used to analyze and compare methods such as missing value imputation, categorical encoding, log transformation, and dimensionality reduction. The results show that preprocessing techniques significantly improve model performance, with certain approaches greatly reducing prediction errors and improving efficiency. Advanced methods, such as PCA with normalization and log transformation, produced the best results, showing the importance of choosing effective preprocessing steps. This study provides practical guidance for using data preprocessing to improve machine learning models, offering insights particularly relevant to real estate price prediction and other structured data applications.

Биография автора

  • Ihcene Zitoune, Казанский университет

    2nd year PhD student, department of data analysis and programming technologies, Kazan Federal University

Библиографические ссылки

Загрузки

Опубликован

2025-05-12

Выпуск

Раздел

Интеллектуальные системы, анализ данных и машинное обучение

Как цитировать

The impact of data preparation techniques on house price prediction task. (2025). Вестник ВГУ. Серия: Системный анализ и информационные технологии, 1, 133-142. https://doi.org/10.17308/sait/1995-5499/2025/1/133-142