452
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Explainable AI in a Real Estate Context – Exploring the Determinants of Residential Real Estate Values

, , ORCID Icon &
Pages 204-245 | Received 17 Jun 2022, Accepted 11 Jan 2023, Published online: 14 Feb 2023
 

Abstract

A sound understanding of real estate markets is of economic importance and not simple, as properties are a heterogenous asset and no two are alike. Traditionally, parametric or semi-parametric and, thus, assumption-based hedonic pricing models are used to analyze real estate market fundamentals. These models are characterized by the fact that they require a-priori assumptions regarding their functional form. Usually, the true functional form is unknown and characterized by non-linearities and joint effects, which are hard to fully capture. Therefore, their results should be interpreted with caution. Applying the state-of-the art non-parametric machine learning XGBoost algorithm, in combination with the model-agnostic Accumulated Local Effects Plots, (ALE) enables us to overcome this problem. Using a dataset of 81,166 residential properties for the seven largest German cities, we show how ALE plots enable us to analyze the value-determining effects of several structural, locational and socio-economic hedonic features. Our findings lead to a deeper representation of real estate market fundamentals.

Disclosure statement

The authors report there are no competing interests to declare.

Data availability statement

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

Notes

1 Applications include Worzala et al. (Citation1995), Din et al. (Citation2001), Peterson and Flanagan (Citation2009), McCluskey et al. (Citation2013) and Chiarazzo et al. (Citation2014) for neural networks. Antipov and Pokryshevskaya (Citation2012), Bogin and Shui (Citation2020) and Pace and Hayunga (Citation2020) for random forests. Focusing on boosting-related methods, see van Wezel et al. (Citation2005), Kagie and van Wezel (Citation2007), Gu and Xu (Citation2017), Sangani et al. (Citation2017), Ho et al. (Citation2021) and Stang et al. (Citation2022).

2 In XAI research, Partial Dependence Plots (PDP) – proposed by Friedman (Citation2001) – are one of the oldest and most widely used methods (see e.g., Levantesi & Piscopo, Citation2020). However, PDP plots have been shown to produce biased results when features are correlated (Apley and Zhu, Citation2020). In real estate, many features have an intrinsic dependence that does not justify the use of PDP plots. In contrast, ALE plots do not have this disadvantage, and are therefore well suited to real estate market analysis.

3 In the context of legally required real estate valuations in Germany, there are slight differences in the methodology used compared to the internationally common approaches. Detailed explanations can be found in Schnaidt and Sebastian (Citation2012).

4 Acxiom is an American data provider for international data. Further information can be found at: https://www.acxiom.com/.

5 Applies if the property is both partly owner-occupied and partly non-owner-occupied (e.g., single-family home with attached rental unit).

6 The correlation matrix is available on request.

7 The individual summary statistics for each city are available on request.

8 K-fold cross validation is a method to test how good the predictive power of a statistical model is. It randomly splits the data set into k equal-sized folds (= blocks). One fold is used to test the model, the remaining folds are used for training. This process is performed k times, so that each fold is used once as test data. At the end, the cross-validation error is calculated by averaging the errors of the individual test folds.

9 Overall, the magnitude of the improvement over the OLS benchmark may be inflated as it ignores the well-established literature on functional forms of the variables. However, we follow this path for three reasons. First, we want to point out that non-linearities and interactions can imply large performance differences. Second, this baseline OLS will also be used as a benchmark for the ALE plots in the next chapter to emphasize the non-linearity of the data as much as possible and third, by using the ALE plots we want to show how non-parametric ML can help choose suitable functional forms for parametric and semi-parametric models.

10 To obtain the results of the permutation feature importance and accumulated local effects the scikit-learn (https://scikit-learn.org/stable/modules/permutation_importance.html) and PyALE (https://pypi.org/project/PyALE/) packages are used.

11 An example of how an OLS can be optimized by using the results of the ALEs can be seen in Appendix E.

12 However, these results have to be interpreted with caution. There are several location-based features in our dataset. Besides latitude and longitude, there are the four micro-scores, which also describe the surrounding location of the properties. Furthermore, there are three socio-economic variables in the dataset, which are available at the ZIP code level and thus could also be seen as a proxy for location. To obtain the overall effect of the location on the price, these individual effects would have to be aggregated.

13 However, it bears repeating at this point that the results should also be interpreted with caution, as the unemployment rate can also serve as a simple proxy for the location of a property, due to its availability at the ZIP code level.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.