The impact of data preparation techniques on house price prediction task
DOI:
https://doi.org/10.17308/sait/1995-5499/2025/1/133-142Keywords:
real estate price prediction, Feature engineering, dimentionality reduction, Pca, autoencoders, One-Hot encoding, handling outliers, Target encodingAbstract
Accurate house price prediction is considered critical for decision-making in the real estate sector, where datasets are often characterized by missing values, outliers, and skewed distributions. In this study, the impact of various data preprocessing techniques on the performance of the XGBoost algorithm for predicting house prices is investigated. A real estate dataset from Kaggle is used to analyze and compare methods such as missing value imputation, categorical encoding, log transformation, and dimensionality reduction. The results show that preprocessing techniques significantly improve model performance, with certain approaches greatly reducing prediction errors and improving efficiency. Advanced methods, such as PCA with normalization and log transformation, produced the best results, showing the importance of choosing effective preprocessing steps. This study provides practical guidance for using data preprocessing to improve machine learning models, offering insights particularly relevant to real estate price prediction and other structured data applications.
References
Downloads
Published
Issue
Section
License
Условия передачи авторских прав in English













