ABSTRACT
R2 and adjusted R2 may exaggerate a model’s true ability to predict the dependent variable in the presence of overfitting, whereas leave-one-out R2 (LOOR2) is robust to overfitting. We demonstrate this by replicating 279 regressions from 100 papers in top economics journals, where the median increases of R2 and adjusted R2 over LOOR2 reach 40.2% and 21.4% respectively. The inflation of test errors over training errors increases with the severity of overfitting as measured by the number of regressors and nonlinear terms, and the presence of outliers, but decreases with the sample size. These results are further validated by Monte Carlo simulations.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/15140326.2023.2207326.
Notes
1 To be sure, economics is not the only discipline in this regard. For example, Parady et al. (Citation2021) laments the overreliance on statistical goodness-of-fit and under-reliance on model validation in the transportation literature.
2 The original formula for adjusted R2 was first proposed in a paper by M. J. B. Ezekiel, who read it before the Mathematical Society at its annual meeting in 1928, but gave the credit to B. B. Smith.
3 We ignore the case of linear regression without a constant term, as it is rarely encountered in practice.
4 For example, the short-cut algorithm for computing LOOR2 could be implemented in Stata by using the user-written command “cv_regress” (Rios-Avila, Citation2018) after the usual “regress” command for OLS regression.
5 These terminologies are in the same spirit as “variance inflation factor” (VIF).
6 In fact, the presence of many covariates also increases the complexity of regression function.
7 These four journals are selected partly because their replication data and programs are more easily accessible. See the Appendix for a complete list of these 100 papers.
8 Note that Dower et al. (Citation2021) only report R2.
9 The results of using EIF or adjusted EIF as the dependent variables are qualitatively similar, but the fit is slightly worse. To save space, we only report results using Log(EIF) and Log(Adjusted EIF) as the dependent variables.:
10 Typically, the sample sizes of regressions within a paper change because of adding more variables, which may result in missing observations.
11 As pointed out by an anonymous referee, adding nonlinear terms can be viewed as a particular case of including additional correlated covariates.
12 We thank an anonymous referee for useful discussions about the relation between overfitting and parameter significance, and more studies are needed in this direction.
Additional information
Notes on contributors
Qiang Chen
Qiang Chen is a professor at the School of Economics, Shandong University.
Ji Qi
Ji Qi is a PhD student at the School of Economics, Shandong University.