254
Views
0
CrossRef citations to date
0
Altmetric
Bayesian Methods

Influential Observations in Bayesian Regression Tree Models

ORCID Icon, &
Pages 47-63 | Received 26 Mar 2022, Accepted 19 Mar 2023, Published online: 21 Jun 2023
 

Abstract

Bayesian Classification and Regression Trees (BCART) and Bayesian Additive Regression Trees (BART) are popular Bayesian regression models widely applicable in modern regression problems. Their popularity is intimately tied to the ability to flexibly model complex responses depending on high-dimensional inputs while simultaneously being able to quantify uncertainties. This ability to quantify uncertainties is key, as it allows researchers to perform appropriate inferential analyses in settings that have generally been too difficult to handle using the Bayesian approach. However, surprisingly little work has been done to evaluate the sensitivity of these modern regression models to violations of modeling assumptions. In particular, we will consider influential observations, which one reasonably would imagine to be common—or at least a concern—in the big-data setting. In this article, we consider both the problem of detecting influential observations and adjusting predictions to not be unduly affected by such potentially problematic data. We consider three detection diagnostics for Bayesian tree models, one an analogue of Cook’s distance and the others taking the form of a divergence measure and a conditional predictive density metric, and then propose an importance sampling algorithm to re-weight previously sampled posterior draws so as to remove the effects of influential data in a computationally efficient manner. Finally, our methods are demonstrated on real-world data where blind application of the models can lead to poor predictions and inference. Supplementary materials for this article are available online.

Supplementary Materials

The online supplementary materials include proofs of equations and propositions (supplement.pdf) and example code to reproduce figures in the paper (bartinfluence.zip).

Additional information

Funding

The work of MTP was supported in part by the National Science Foundation (NSF) under Agreement DMS-1916231 and in part by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2018-CRG7-3800.3. The work of EIG was supported by NSF DMS-1916245. The work of REM was supported by NSF DMS-1916233.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.