Abstract
Causal inference plays a crucial role in biomedical studies and social sciences. Over the years, researchers have devised various methods to facilitate causal inference, particularly in observational studies. Among these methods, the doubly robust estimator distinguishes itself through a remarkable feature: it retains its consistency even when only one of the two components – either the propensity score model or the outcome mean model – is correctly specified, rather than demanding correctness in both simultaneously. In this paper, we focus on scenarios where semiparametric models are employed for both the propensity score and the outcome mean. Semiparametric models offer a valuable blend of interpretability akin to parametric models and the adaptability characteristic of nonparametric models. In this context, achieving correct model specification involves both accurately specifying the unknown function and consistently estimating the unknown parameter. We introduce a novel concept: the relaxed doubly robust estimator. It operates in a manner reminiscent of the traditional doubly robust estimator but with a reduced requirement for double robustness. In essence, it only mandates the consistent estimate of the unknown parameter, without requiring the correct specification of the unknown function. This means that it only necessitates a partially correct model specification. We conduct a thorough analysis to establish the double robustness and semiparametric efficiency of our proposed estimator. Furthermore, we bolster our findings with comprehensive simulation studies to illustrate the practical implications of our approach.
1. Introduction
Causal inference plays a pivotal role across a variety of scientific disciplines, encompassing both randomized experiments and observational studies. While randomized experiments are the ideal gold standard for establishing causal relationships regarding treatment effects, their implementation can be infeasible and often challenging, due to ethical or logistical constraints in many real-world situations.
Alternatively, one can utilize observational data to draw causal inferences, provided that appropriate corrections are made to mitigate the bias arising from confounding factors resulting from nonrandomized treatment assignment. Given that all confounders affecting the relationship between treatment and outcome are observed, researchers have devised a wide array of techniques to effectively control for confounding, such as the weighting and matching methods based on propensity score (Rosenbaum & Rubin, Citation1983), outcome regression methods, and doubly robust (DR) methods based on the idea of augmenting the inverse probability weighting estimator (Bang & Robins, Citation2005; Lunceford & Davidian, Citation2004; Robins et al., Citation1994, Citation1995; Rotnitzky et al., Citation1998; Rotnitzky & Vansteelandt, Citation2014; Scharfstein et al., Citation1999). These methods have been well written into textbooks, such as Imbens and Rubin (Citation2015).
Among all these methods, the doubly robust method has been popular because its consistency relies on the correct specification of either the propensity score model or the outcome mean model, not necessarily both. Further, if both propensity score model and outcome mean model are correctly specified, the doubly robust estimator becomes semiparametrically efficient (Bickel et al., Citation1993; Tsiatis, Citation2006). Various semiparametric models such as the single-index model have been popularly used in modelling the propensity score or the outcome mean, because semiparametric models typically possess the interpretability of parametric models and the flexibility of nonparametric models. Nonetheless, the advantageous property of double robustness may be compromised when semiparametric models are employed for either the propensity score or the outcome mean. In such cases, correct model specification necessitates the correct specification of the unknown function as well as the consistent estimation of the unknown parameter – two critical elements within the semiparametric model. In particular, the consistent estimation of the unknown parameter might not need the correct specification of the unknown function.
In this paper, we introduce a novel approach to doubly robust estimation specifically tailored for semiparametric models used in both the propensity score and the outcome mean. As mentioned earlier, the traditional doubly robust estimation hinges on the correct specification of the unknown function and the consistent estimation of the unknown parameter – two essential components that need to be accurately identified within either the propensity score or the outcome mean. In contrast, our newly proposed doubly robust estimation only demands the consistent estimation of the unknown parameter, reducing the requirement to a single piece of unknown information within the semiparametric model. Hence, we refer to our innovative method as the ‘relaxed doubly robust’ (RDR) estimation. Furthermore, when both parameters (one within the propensity score model and the other within the outcome mean model) are consistently estimated, our RDR estimator also attains the semiparametric efficiency bound.
The remaining of the paper is structured as follows. In Section 2, we first review the basics of the traditional DR estimation and point out its limitation when a semiparametric propensity score model or a semiparametric outcome mean model needs to be correctly specified. We then present the proposed RDR estimation in Section 3, that includes both algorithm in Section 3.1 and theory in Section 3.2. In Section 4, we conduct comprehensive simulation studies to illustrate our proposed estimator as well as its comparison with the traditional DR estimation. We conclude the paper with some discussions in Section 5.
2. Review of traditional DR estimation
We adopt the potential outcome framework (Neyman, Citation1923; Rubin, Citation1974) to briefly present the traditional DR estimation. We denote a p-dimensional pre-treatment covariate. Suppose the treatment variable T is binary, , in that 1 stands for treatment and 0 control. For each level of the treatment T = t, under the standard stable unit treatment value assumption (SUTVA) (Rubin, Citation1980), we assume that there exists the potential outcome , representing the outcome had the subject, possibly contrary to the fact, been given the treatment t. We denote the observed outcome as Y that has the decomposition , based on the consistency assumption. We have independent and identically distributed data , , that are realizations of the triplet .
Throughout, we focus on estimating the average treatment effect (ATE), defined as The ATE is one of the most popularly used causal estimands in a variety of scientific applications, and could generate important policy implications. The fundamental difficulty of estimating the ATE in causal inference is that one may observe at most one of and for every subject. We adopt the following two standard assumptions that are widely used in the causal inference literature. The first assumption is about the ignorability.
Assumption 2.1
Ignorability
We assume , which stands for conditional independence.
Assumption 2.1 requires to include all potential confounders relevant to both treatment and outcome to be fully identified and completely observed. It indicates, the treatment assignment is independent of the potential outcomes and after conditioning on all pre-treatment confounders . Accordingly, we denote the propensity score model as We also denote the outcome mean models as The second assumption pertains to the overlapping condition of the covariate distributions between the treated group and the controls.
Assumption 2.2
Overlap
There exist constants and such that almost surely.
Assumption 2.2 means that the covariates' distributions between the two groups are roughly somewhat similar to each other. If this assumption is not satisfied at some value of , the subjects with this value can only be in the treatment group or the control group, which would lead to the extrapolation at this value that makes the inference about the ATE inappropriate.
Under Assumptions 2.1 and 2.2, the ATE can be identified and then estimated via a variety of well known methods (Imbens, Citation2004; Rosenbaum, Citation2002). These methods would need either an estimate of the propensity score model, say, , that correctly specifies , or an estimate of the outcome mean models, say, and , that respectively specifies and correctly, or both of them. These methods would also induce different estimation efficiencies. In particular, the efficient influence function (EIF) (Tsiatis, Citation2006) for estimating the ATE Δ is Accordingly, the traditional DR estimator is It is called DR since it only needs that correctly specifies or that correctly specifies for t = 1, 0, but not necessarily both. If all of these nuisance functions are correctly specified, the asymptotic variance of this estimator achieves the semiparametric efficiency bound and this traditional DR estimator is called semiparametrically efficient. Among all regular and asymptotically linear estimators, is the minimum possible asymptotic variance that one can attain.
2.1. Limitation of traditional DR estimation
In reality, practitioners may choose to model the nuisance functions , and differently based on their own preferences. For example, one may impose parametric restrictions, say, , and , where only the parameters , and are unknown. On the other hand, one may choose to impose no parametric restrictions and regard all of them , and as nonparametric functions.
As another alternative, semiparametric models such as single index models generally possess the interpretability of parametric models as well as the flexibility of nonparametric models, and thus have been widely used in practice to model either the propensity score or the outcome mean or the both. Throughout, we consider single index models , and , where , and are unknown functions, , and are unknown p-dimensional vectors whose first elements are all one's. Under such a scenario, the traditional DR estimator becomes Clearly, the double robustness then refers to either the correct specification of as well as the consistent estimate of , or the correct specifications of and as well as the consistent estimates of and . This might not be appealing for practical use since the correct model specification includes both correct specification of functional forms and consistent estimate of unknown parameters.
3. Proposed RDR estimation
We propose a relaxed doubly robust (RDR) estimator for estimating ATE under the scenario that both propensity score and outcome mean are single index models. We term it RDR since its double robustness refers to the consistent estimate of the parameter or of the parameters and only. Further, when all of the parameters , and are consistently estimated, the RDR estimator would also be semiparametrically efficient. The correct specification of functional forms , and would never be required.
Our key idea is to rewrite the definition of Δ as and then propose the RDR estimator as where the estimates of parameters , and can be obtained through some off-the-shelf dimension reduction methods, and the estimate of the conditional expectation can be obtained from the standard kernel regression (Cheng, Citation1994; Hu et al., Citation2012), which will be detailed below.
3.1. Algorithm
The algorithm entails the following six steps.
Estimate the coefficients , , and from the regression models of regressing T on , regressing on given T = 1, regressing on given T = 0, using some dimension reduction technique such as the sliced inverse regression (SIR). These estimates are denoted by , , and , respectively.
Compute and for . Now the original data has been transformed to and .
For , estimate by a Nadaraya-Watson kernel regression of on , where is a bivariate kernel function, with and as the square root of the 2-by-2 sample covariance matrix of . Here is the bandwidth, and we follow Hu et al. (Citation2012) to choose the optimal bandwidth .
Similarly, for , estimate by a Nadaraya-Watson kernel regression of on , where is a bivariate kernel function, with and as the square root of the 2-by-2 sample covariance matrix of .
Estimate and by the sample average of and . That is,
The proposed RDR estimator is
3.2. Theory
To facilitate the presentation of the theory, we define the following conditions.
Condition (i): is consistently estimated and its estimate satisfies .
Condition (ii): and are consistently estimated and their estimates and satisfy and .
We will defer some discussions on how to consistently estimate these parameters , and to Section 5. In general, interested readers should refer to the literature on semiparametric modelling techniques, including the single-index models and the dimension reduction techniques.
Other than Assumptions 2.1 and 2.2, our theory also requires the following regularity conditions.
The kernel function in satisfies , for , and for .
The density of is bounded away from 0.
As , the bandwidth matrix such that and , where is the trace of a matrix.
for some k>6.
Theorem 3.1
Relaxed Double Robustness
Given Assumptions 2.1 and 2.2 and the above regularity conditions, if either Condition (i) or Condition (ii) is satisfied, then the proposed estimator is asymptotically consistent; that is, as .
Theorem 3.2
Semiparametric Efficiency
Given Assumptions 2.1 and 2.2 and the above regularity conditions, if both Condition (i) and Condition (ii) are satisfied, then the proposed estimator achieves the semiparametric efficiency; that is where is the EIF for estimating Δ under the scenario that both propensity score and outcome mean are single index models.
The proofs of Theorems 3.1 and 3.2 are contained in the Appendix.
4. Simulation studies
In this section, we conduct comprehensive simulation studies to illustrate the proposed RDR estimator, as well as its comparison with the oracle estimator (Oracle, using all simulated and data), the regression based estimator (REG), the IPW estimator (IPW) and the traditional DR estimator (DR).
We consider the following three settings pertaining to the misspecification of the nuisance functional forms.
Setting 1: neither nor , t = 1, 0 is correctly specified;
Setting 2: only is correctly specified;
Setting 3: only , t = 1, 0 is correctly specified.
We expect that, in every setting, the RDR would perform similarly as the Oracle. Because of the constraints on the correct model specification, we expect that all three methods REG, IPW and DR would be biased in Setting 1, REG would be biased in Setting 2, and IPW would be biased in Setting 3.
In each setting, we consider four cases with sample size n = 1000 and n = 2000 blended with independent and correlated covariates: For Setting 1, we consider the nuisance models where , and . For Setting 2, we consider and the same outcome means models as Setting 1. For Setting 3, we consider For every specific case, we use the sliced inverse regression (SIR) method to obtain , and for the proposed RDR estimator. For the traditional REG, IPW and DR methods, we simply fit logistic regression for and linear regression for , t = 1, 0. We conduct 500 Monte Carlo simulations and report the empirical bias and empirical mean squared error in Tables , and , respectively.
The results of the simulation Setting 1 are shown in Table . We could see that when both models are misspecified, the relaxed doubly robust estimator has the smallest biases for , , and Δ among all these estimators. However, the regression, IPW, and DR estimators are significantly biased for these estimands. Also, the mean squared errors of the RDR estimator are quite close to that of the oracle estimator under different scenarios, but other estimators are not.
From Table , we could check the numerical results for our simulation Setting 2. When the regression models are misspecified, the regression estimator has significantly larger biases than other estimators while the IPW and DR estimators have small biases, which echoes with our expectation. Our RDR estimator achieves relatively smaller biases and MSEs than the regression estimators, and its MSEs are quite close to that of the oracle estimator, which means that the performance of the RDR estimator is quite good under this condition.
Finally, we can find the similar results in Table for simulation Setting 3. When regression models are correctly specified, the regression and DR estimators both achieve relatively small biases and mean square errors. The IPW estimator has large biases and MSEs due to the model misspecification. Our RDR estimator achieves small biases and mean square errors that are close to the oracle estimator, which means that it has good performance when the propensity score model is misspecified.
5. Discussion
In this paper, we propose a relaxed doubly robust (RDR) estimator for estimating the average treatment effect under the scenario that both propensity score and outcome mean are single index models. In contrast to the traditional doubly robust (DR) estimation, our new estimator only necessitates partial correctness in model specification. It only requires consistent estimation of the unknown parameter in either the propensity score model or the outcome mean model, or both, to achieve double robustness and semiparametric efficiency, respectively. The correct specification of unknown functions in these single-index models is no longer a need. This is why we term it ‘relaxed double robustness’.
To implement the proposed RDR estimator, one needs to consistently estimate , and in a semiparametric modelling framework. This is a stand-alone problem that was fortunately well studied in the literature. One can either go with standard semiparametric methods in single-index models (Ichimura, Citation1993; Klein & Spady, Citation1993) or choose to estimate these unknown parameters in the context of dimension reduction (Cook, Citation2009; Ma & Zhu, Citation2012, Citation2013). In our numerical studies, we choose the standard sliced inverse regression (Li, Citation1991) in dimension reduction to estimate these parameters , and .
Finally, as the Associate Editor suggested, we point out that different relaxed doubly robust estimations also exist. For instance, after estimating the unknown parameters , and same as what we propose, one may opt to using some one-dimensional nonparametric regression method such as kernels to estimate , and respectively. It will ultimately lead to the estimator which is also relaxed doubly robust. This estimator and the proposed estimator have different ideas. The proposed RDR estimator achieves the double robustness by fitting two augmented outcome mean models and ; see the details in steps (iii)-(v) of the algorithm in Section 3.1. To the contrary, the aforementioned estimator achieves the double robustness because it was directly implemented according to the efficient influence function.
Acknowledgements
The authors would like to thank the Editor, an Associate Editor, and one reviewer for their insightful comments which have helped improve the manuscript substantially.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Funding
References
- Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973. https://doi.org/10.1111/biom.2005.61.issue-4
- Bickel, P. J., Klaassen, J., Ritov, Y., & Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore.
- Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89(425), 81–87. https://doi.org/10.1080/01621459.1994.10476448
- Cook, R. D. (2009). Regression graphics: Ideas for studying regressions through graphics. John Wiley & Sons.
- Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66(2), 315–331. https://doi.org/10.2307/2998560
- Hu, Z., Follmann, D. A., & Qin, J. (2012). Semiparametric double balancing score estimation for incomplete data with ignorable missingness. Journal of the American Statistical Association, 107(497), 247–257. https://doi.org/10.1080/01621459.2012.656009
- Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1–2), 71–120. https://doi.org/10.1016/0304-4076(93)90114-K
- Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics, 86(1), 4–29. https://doi.org/10.1162/003465304323023651
- Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
- Klein, R. W., & Spady, R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica: Journal of the Econometric Society, 61(2), 387–421. https://doi.org/10.2307/2951556
- Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316–327. https://doi.org/10.1080/01621459.1991.10475035
- Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23(19), 2937–2960. https://doi.org/10.1002/sim.v23:19
- Ma, Y., & Zhu, L. (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 107(497), 168–179. https://doi.org/10.1080/01621459.2011.646925
- Ma, Y., & Zhu, L. (2013). A review on dimension reduction. International Statistical Review, 81(1), 134–150. https://doi.org/10.1111/insr.2013.81.issue-1
- Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences agaricales: Essay de principle. english translation of excerpts by Dabrowska, D. and Speed, T. Statistical Science, 5(4), 465–472.
- Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866. https://doi.org/10.1080/01621459.1994.10476818
- Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429), 106–121. https://doi.org/10.1080/01621459.1995.10476493
- Rosenbaum, P. R. (2002). Overt bias in observational studies. Springer.
- Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41
- Rotnitzky, A., Robins, J. M., & D. O. Scharfstein (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93(444), 1321–1339. https://doi.org/10.1080/01621459.1998.10473795
- Rotnitzky, A., & Vansteelandt, S. (2014). Double-robust methods. In Handbook of Missing Data Methodology (pp. 185–212). CRC Press.
- Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
- Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593.
- Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 1096–1120. https://doi.org/10.1080/01621459.1999.10473862
- Tsiatis, A. A. (2006). Semiparametric theory and missing data. Springer.
Appendix
Proof
Proof of Theorem 3.1
First, we can prove that , i.e., , is consistent for if either of the following conditions is satisfied.
Condition (a): is correctly specified, where is the true coefficient for the propensity score model;
Condition (b): is correctly specified, where is the true coefficient for the linear model of on .
If is correctly specified, the propensity score should be some function of with probability 1, which means that could also be written as a function of . Denote this function by f, and we have If is correctly specified, should be some function of with probability 1, which means that could also be written as a function of . Denote this function by g, and we have Note that the following equation holds if is correctly specified (A1) (A1) For the estimation in the algorithm, we estimate by a Nadaraya-Watson kernel regression of on . Under the regularity conditions, we can show that converges in probability to which can be shown equal to under either Condition (a) or Condition (b).
When Condition (a) is satisfied, i.e., is correctly specified, we have shown that with probability 1, whereby we can further simplify the limit above. When Condition (b) is satisfied, i.e., is correctly specified, we have shown that with probability 1. The limit could be further simplified as follows. Thus, is consistent for , i.e., is consistent for when either or is correctly specified.
Similarly, we could also show that is consistent for , i.e., is consistent for when either or is correctly specified.
Combining these two consistencies, we know that is a consistent estimator for Δ when either Condition (1) or Condition (2) is satisfied.
Note that the proposed estimator is instead of . Next, we want to show that and are asymptotically equivalent to and respectively under certain conditions, whereby the proposed estimator is asymptotically equivalent to , and has the same double robustness.
Define and . Thus, and , where b is a continuous function of .
Define We want to show that is asymptotically equivalent to if for some k>6 and are satisfied.
To prove the claim above, it is enough to prove that is asymptotically equivalent to under the given conditions.
We can first simplify the bivariate kernel function by normalization. Note that if we use , can also preserve the same property as since there exists a continuous function g such that . After the linear transformation, has the identity covariance, which means that we can simplify the bivariate kernel function to Thus, we can rewrite the bivariate kernel functions as and respectively.
Notice that is a function of . By Taylor's Expansion, it could be further written as where is on the line segment between and . Since , we have almost surely by the Borel-Cantelli lemma. Thus, almost surely by the triangle inequality. Combined with the optimal bandwidth and the condition that , is . Since k>6, this term goes to zero, which means that is asymptotically equivalent to .
Thus, that the kernel regression estimator is consistent for if for some k>6 and are satisfied.
Similarly, to prove the consistency of , we can define and . We can use a similar argument to show that is asymptotically equivalent to if for some k>6 and are satisfied.
Hence, is asymptotically equivalent to under the given conditions, which indicates that is asymptotically consistent if either Condition (i) or Condition (ii) is satisfied.
Proof
Proof of Theorem 3.2
To prove the optimal efficiency for , we need to prove the optimal efficiency for and .
We can first show that if is correctly specified, has an asymptotic normal distribution And then, if is additionally satisfied, we can further prove that achieves optimal efficiency.
To prove that has such asymptotic distribution, we could split it into three parts, where For , we can show that asymptotically converges to due to the central limit theorem. For , we can first rewrite as When is correctly specified, with probability 1. The denominator could be further written as . Hence, could be rewritten as We can show that with by Theorem 2.1 from Cheng (Citation1994). Thus, by the central limit, we can show that asymptotically converges to . For , we know that and Thus, .
Combining all three parts together, we know that we have the following asymptotic distribution: Note that the asymptotic variance could be rewritten as Additionally, if is correctly specified, we have the equation according to the argument in the proof of Theorem 3.1. From this equation, could be further rewritten as : Thus, the second term could be rewritten according to the equation and the condition that with probability 1. Combining all the equations above, we know that the asymptotic variance could be transformed into the following expression, According to Hahn (Citation1998), is the optimal efficient variance that a double robust estimator can achieve. It means that when and are both correctly specified, achieves optimal efficiency.
We can use a similar argument to prove that when and are both correctly specified, achieves optimal efficiency.
Note that the optimal efficiency for and indicates that achieves optimal efficiency for Δ when both Condition (1) and Condition (2) are satisfied.
According to the proof of Theorem 3.1, we know that the proposed estimator is asymptotically equivalent to under the given conditions. It implies that when both Condition (1) and Condition (2) are satisfied, also achieves optimal efficiency for Δ.