Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Causal inference plays a crucial role in biomedical studies and social sciences. Over the years, researchers have devised various methods to facilitate causal inference, particularly in observational studies. Among these methods, the doubly robust estimator distinguishes itself through a remarkable feature: it retains its consistency even when only one of the two components – either the propensity score model or the outcome mean model – is correctly specified, rather than demanding correctness in both simultaneously. In this paper, we focus on scenarios where semiparametric models are employed for both the propensity score and the outcome mean. Semiparametric models offer a valuable blend of interpretability akin to parametric models and the adaptability characteristic of nonparametric models. In this context, achieving correct model specification involves both accurately specifying the unknown function and consistently estimating the unknown parameter. We introduce a novel concept: the relaxed doubly robust estimator. It operates in a manner reminiscent of the traditional doubly robust estimator but with a reduced requirement for double robustness. In essence, it only mandates the consistent estimate of the unknown parameter, without requiring the correct specification of the unknown function. This means that it only necessitates a partially correct model specification. We conduct a thorough analysis to establish the double robustness and semiparametric efficiency of our proposed estimator. Furthermore, we bolster our findings with comprehensive simulation studies to illustrate the practical implications of our approach.

Keywords:

1. Introduction

Causal inference plays a pivotal role across a variety of scientific disciplines, encompassing both randomized experiments and observational studies. While randomized experiments are the ideal gold standard for establishing causal relationships regarding treatment effects, their implementation can be infeasible and often challenging, due to ethical or logistical constraints in many real-world situations.

Alternatively, one can utilize observational data to draw causal inferences, provided that appropriate corrections are made to mitigate the bias arising from confounding factors resulting from nonrandomized treatment assignment. Given that all confounders affecting the relationship between treatment and outcome are observed, researchers have devised a wide array of techniques to effectively control for confounding, such as the weighting and matching methods based on propensity score (Rosenbaum & Rubin, Citation1983), outcome regression methods, and doubly robust (DR) methods based on the idea of augmenting the inverse probability weighting estimator (Bang & Robins, Citation2005; Lunceford & Davidian, Citation2004; Robins et al., Citation1994, Citation1995; Rotnitzky et al., Citation1998; Rotnitzky & Vansteelandt, Citation2014; Scharfstein et al., Citation1999). These methods have been well written into textbooks, such as Imbens and Rubin (Citation2015).

Among all these methods, the doubly robust method has been popular because its consistency relies on the correct specification of either the propensity score model or the outcome mean model, not necessarily both. Further, if both propensity score model and outcome mean model are correctly specified, the doubly robust estimator becomes semiparametrically efficient (Bickel et al., Citation1993; Tsiatis, Citation2006). Various semiparametric models such as the single-index model have been popularly used in modelling the propensity score or the outcome mean, because semiparametric models typically possess the interpretability of parametric models and the flexibility of nonparametric models. Nonetheless, the advantageous property of double robustness may be compromised when semiparametric models are employed for either the propensity score or the outcome mean. In such cases, correct model specification necessitates the correct specification of the unknown function as well as the consistent estimation of the unknown parameter – two critical elements within the semiparametric model. In particular, the consistent estimation of the unknown parameter might not need the correct specification of the unknown function.

In this paper, we introduce a novel approach to doubly robust estimation specifically tailored for semiparametric models used in both the propensity score and the outcome mean. As mentioned earlier, the traditional doubly robust estimation hinges on the correct specification of the unknown function and the consistent estimation of the unknown parameter – two essential components that need to be accurately identified within either the propensity score or the outcome mean. In contrast, our newly proposed doubly robust estimation only demands the consistent estimation of the unknown parameter, reducing the requirement to a single piece of unknown information within the semiparametric model. Hence, we refer to our innovative method as the ‘relaxed doubly robust’ (RDR) estimation. Furthermore, when both parameters (one within the propensity score model and the other within the outcome mean model) are consistently estimated, our RDR estimator also attains the semiparametric efficiency bound.

The remaining of the paper is structured as follows. In Section 2, we first review the basics of the traditional DR estimation and point out its limitation when a semiparametric propensity score model or a semiparametric outcome mean model needs to be correctly specified. We then present the proposed RDR estimation in Section 3, that includes both algorithm in Section 3.1 and theory in Section 3.2. In Section 4, we conduct comprehensive simulation studies to illustrate our proposed estimator as well as its comparison with the traditional DR estimation. We conclude the paper with some discussions in Section 5.

2. Review of traditional DR estimation

We adopt the potential outcome framework (Neyman, Citation1923; Rubin, Citation1974) to briefly present the traditional DR estimation. We denote $X$ a p-dimensional pre-treatment covariate. Suppose the treatment variable T is binary, $T \in {0, 1}$ , in that 1 stands for treatment and 0 control. For each level of the treatment T = t, under the standard stable unit treatment value assumption (SUTVA) (Rubin, Citation1980), we assume that there exists the potential outcome $Y_{t}$ , representing the outcome had the subject, possibly contrary to the fact, been given the treatment t. We denote the observed outcome as Y that has the decomposition $Y = T Y_{1} + (1 - T) Y_{0}$ , based on the consistency assumption. We have independent and identically distributed data ${(t_{i}, y_{i}, x_{i})}$ , $i = 1, \dots, n$ , that are realizations of the triplet $(T, Y, X)$ .

Throughout, we focus on estimating the average treatment effect (ATE), defined as $Δ = E (Y_{1}) - E (Y_{0}) .$ The ATE is one of the most popularly used causal estimands in a variety of scientific applications, and could generate important policy implications. The fundamental difficulty of estimating the ATE in causal inference is that one may observe at most one of $Y_{1}$ and $Y_{0}$ for every subject. We adopt the following two standard assumptions that are widely used in the causal inference literature. The first assumption is about the ignorability.

Assumption 2.1

Ignorability

We assume $(Y_{1}, Y_{0}) ⫫ T ∣ X$ , which stands for conditional independence.

Assumption 2.1 requires $X$ to include all potential confounders relevant to both treatment and outcome to be fully identified and completely observed. It indicates, the treatment assignment is independent of the potential outcomes $Y_{1}$ and $Y_{0}$ after conditioning on all pre-treatment confounders $X$ . Accordingly, we denote the propensity score model as $π (X) \equiv pr (T = 1 ∣ Y_{1}, Y_{0}, X) = pr (T = 1 ∣ X) .$ We also denote the outcome mean models as $\begin{aligned} μ_{1} (X) & \equiv E (Y_{1} ∣ X) = E (Y_{1} ∣ X, T = 1), and \\ μ_{0} (X) & \equiv E (Y_{0} ∣ X) = E (Y_{0} ∣ X, T = 0) . \end{aligned}$ The second assumption pertains to the overlapping condition of the covariate distributions between the treated group and the controls.

Assumption 2.2

Overlap

There exist constants $c_{1}$ and $c_{2}$ such that $0 < c_{1} \leq π (X) \leq c_{2} < 1$ almost surely.

Assumption 2.2 means that the covariates' distributions between the two groups are roughly somewhat similar to each other. If this assumption is not satisfied at some value of $X$ , the subjects with this value can only be in the treatment group or the control group, which would lead to the extrapolation at this value that makes the inference about the ATE inappropriate.

Under Assumptions 2.1 and 2.2, the ATE can be identified and then estimated via a variety of well known methods (Imbens, Citation2004; Rosenbaum, Citation2002). These methods would need either an estimate of the propensity score model, say, $\hat{π} (x)$ , that correctly specifies $π (x)$ , or an estimate of the outcome mean models, say, ${\hat{μ}}_{1} (x)$ and ${\hat{μ}}_{0} (x)$ , that respectively specifies $μ_{1} (x)$ and $μ_{0} (x)$ correctly, or both of them. These methods would also induce different estimation efficiencies. In particular, the efficient influence function (EIF) (Tsiatis, Citation2006) for estimating the ATE Δ is $ϕ = \frac{t}{π (x)} {y_{1} - μ_{1} (x)} - \frac{1 - t}{1 - π (x)} {y_{0} - μ_{0} (x)} + μ_{1} (x) - μ_{0} (x) - Δ .$ Accordingly, the traditional DR estimator is ${\hat{Δ}}_{dr} = \frac{1}{n} \sum_{i = 1}^{n} [\frac{t_{i}}{\hat{π} (x_{i})} {y_{1 i} - {\hat{μ}}_{1} (x_{i})} - \frac{1 - t_{i}}{1 - \hat{π} (x_{i})} {y_{0 i} - {\hat{μ}}_{0} (x_{i})} + {\hat{μ}}_{1} (x_{i}) - {\hat{μ}}_{0} (x_{i})] .$ It is called DR since it only needs that $\hat{π} (x)$ correctly specifies $π (x)$ or that ${\hat{μ}}_{t} (x)$ correctly specifies $μ_{t} (x)$ for t = 1, 0, but not necessarily both. If all of these nuisance functions are correctly specified, the asymptotic variance of this estimator achieves the semiparametric efficiency bound $E (ϕ^{2})$ and this traditional DR estimator is called semiparametrically efficient. Among all regular and asymptotically linear estimators, $E (ϕ^{2})$ is the minimum possible asymptotic variance that one can attain.

2.1. Limitation of traditional DR estimation

In reality, practitioners may choose to model the nuisance functions $π (x)$ , $μ_{1} (x)$ and $μ_{0} (x)$ differently based on their own preferences. For example, one may impose parametric restrictions, say, $π (x, α)$ , $μ_{1} (x, β)$ and $μ_{0} (x, γ)$ , where only the parameters $α$ , $β$ and $γ$ are unknown. On the other hand, one may choose to impose no parametric restrictions and regard all of them $π (x)$ , $μ_{1} (x)$ and $μ_{0} (x)$ as nonparametric functions.

As another alternative, semiparametric models such as single index models generally possess the interpretability of parametric models as well as the flexibility of nonparametric models, and thus have been widely used in practice to model either the propensity score or the outcome mean or the both. Throughout, we consider single index models $π (α^{⊤} x)$ , $μ_{1} (β^{⊤} x)$ and $μ_{0} (γ^{⊤} x)$ , where $π (\cdot)$ , $μ_{1} (\cdot)$ and $μ_{0} (\cdot)$ are unknown functions, $α$ , $β$ and $γ$ are unknown p-dimensional vectors whose first elements are all one's. Under such a scenario, the traditional DR estimator becomes ${\hat{Δ}}_{dr} = \frac{1}{n} \sum_{i = 1}^{n} [\frac{t_{i}}{\hat{π} ({\hat{α}}^{⊤} x_{i})} {y_{1 i} - {\hat{μ}}_{1} ({\hat{β}}^{⊤} x_{i})} - \frac{1 - t_{i}}{1 - \hat{π} ({\hat{α}}^{⊤} x_{i})} {y_{0 i} - {\hat{μ}}_{0} ({\hat{γ}}^{⊤} x_{i})} + {\hat{μ}}_{1} ({\hat{β}}^{⊤} x_{i}) - {\hat{μ}}_{0} ({\hat{γ}}^{⊤} x_{i})] .$ Clearly, the double robustness then refers to either the correct specification of $π (\cdot)$ as well as the consistent estimate of $α$ , or the correct specifications of $μ_{1} (\cdot)$ and $μ_{0} (\cdot)$ as well as the consistent estimates of $β$ and $γ$ . This might not be appealing for practical use since the correct model specification includes both correct specification of functional forms and consistent estimate of unknown parameters.

3. Proposed RDR estimation

We propose a relaxed doubly robust (RDR) estimator for estimating ATE under the scenario that both propensity score and outcome mean are single index models. We term it RDR since its double robustness refers to the consistent estimate of the parameter $α$ or of the parameters $β$ and $γ$ only. Further, when all of the parameters $α$ , $β$ and $γ$ are consistently estimated, the RDR estimator would also be semiparametrically efficient. The correct specification of functional forms $π (\cdot)$ , $μ_{1} (\cdot)$ and $μ_{0} (\cdot)$ would never be required.

Our key idea is to rewrite the definition of Δ as $Δ = E {E (Y_{1} ∣ α^{⊤} x, β^{⊤} x)} - E {E (Y_{0} ∣ α^{⊤} x, γ^{⊤} x)},$ and then propose the RDR estimator as ${\hat{Δ}}_{rdr} \equiv {\hat{Δ}}_{rdr} (\hat{α}, \hat{β}, \hat{γ}) = \frac{1}{n} \sum_{i = 1}^{n} {\hat{E} (Y_{1} ∣ {\hat{α}}^{⊤} x_{i}, {\hat{β}}^{⊤} x_{i}) - \hat{E} (Y_{0} ∣ {\hat{α}}^{⊤} x_{i}, {\hat{γ}}^{⊤} x_{i})},$ where the estimates of parameters $α$ , $β$ and $γ$ can be obtained through some off-the-shelf dimension reduction methods, and the estimate of the conditional expectation $\hat{E} (\cdot ∣ \cdot)$ can be obtained from the standard kernel regression (Cheng, Citation1994; Hu et al., Citation2012), which will be detailed below.

3.1. Algorithm

The algorithm entails the following six steps.

Estimate the coefficients $α$ , $β$ , and $γ$ from the regression models of regressing T on $X$ , regressing $Y_{1}$ on $X$ given T = 1, regressing $Y_{0}$ on $X$ given T = 0, using some dimension reduction technique such as the sliced inverse regression (SIR). These estimates are denoted by $\hat{α}$ , $\hat{β}$ , and $\hat{γ}$ , respectively.
Compute ${\hat{S}}_{i} = ({\hat{α}}^{⊤} x_{i}, {\hat{β}}^{⊤} x_{i})$ and ${\hat{M}}_{i} = ({\hat{α}}^{⊤} x_{i}, {\hat{γ}}^{⊤} x_{i})$ for $i = 1, 2, \dots, n$ . Now the original data $(X, T, Y)$ has been transformed to $(\hat{S}, T, Y)$ and $(\hat{M}, T, Y)$ .
For $i = 1, 2, \dots, n$ , estimate $E (Y_{1} | {\hat{S}}_{i})$ by a Nadaraya-Watson kernel regression of $Y_{1}$ on $\hat{S}$ , $\hat{E} (Y_{1} | {\hat{S}}_{i}) = \hat{E} (Y_{1} | {\hat{α}}^{⊤} x_{i}, {\hat{β}}^{⊤} x_{i}) = \sum_{j = 1}^{n} \frac{T_{j} K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j})}{\sum_{k = 1}^{n} T_{k} K_{H} ({\hat{S}}_{i} - {\hat{S}}_{k})} Y_{1 j},$ where $K (\cdot)$ is a bivariate kernel function, $K_{H} (u) = \frac{K (H^{- 1} u)}{det (H) h_{n}}$ with $u = (u_{1}, u_{2})^{⊤}$ and $H = {\hat{Σ}}_{1}^{1 / 2}$ as the square root of the 2-by-2 sample covariance matrix of $S$ . Here $h_{n}$ is the bandwidth, and we follow Hu et al. (Citation2012) to choose the optimal bandwidth $h_{n} \sim n^{- \frac{1}{3}}$ .
Similarly, for $i = 1, 2, \dots, n$ , estimate $E (Y_{0} | {\hat{M}}_{i})$ by a Nadaraya-Watson kernel regression of $Y_{0}$ on $\hat{M}$ , $\hat{E} (Y_{0} | {\hat{M}}_{i}) = \hat{E} (Y_{0} | {\hat{α}}^{⊤} x_{i}, {\hat{γ}}^{⊤} x_{i}) = \sum_{j = 1}^{n} \frac{(1 - T_{j}) K_{H} ({\hat{M}}_{i} - {\hat{M}}_{j})}{\sum_{k = 1}^{n} (1 - T_{k}) K_{H} ({\hat{M}}_{i} - {\hat{M}}_{k})} Y_{0 j},$ where $K (\cdot)$ is a bivariate kernel function, $K_{H} (u) = \frac{K (H^{- 1} u)}{det (H) h_{n}}$ with $u = (u_{1}, u_{2})^{⊤}$ and $H = {\hat{Σ}}_{0}^{1 / 2}$ as the square root of the 2-by-2 sample covariance matrix of $M$ .
Estimate $E (Y_{1})$ and $E (Y_{0})$ by the sample average of $\hat{E} (Y_{1} | {\hat{α}}^{⊤} x_{i}, {\hat{β}}^{⊤} x_{i})$ and $\hat{E} (Y_{0} | {\hat{α}}^{⊤} x_{i}, {\hat{γ}}^{⊤} x_{i})$ . That is, ${\hat{E}}_{\hat{S}} (Y_{1}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{E} (Y_{1} | {\hat{α}}^{⊤} x_{i}, {\hat{β}}^{⊤} x_{i}), and {\hat{E}}_{\hat{M}} (Y_{0}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{E} (Y_{0} | {\hat{α}}^{⊤} x_{i}, {\hat{γ}}^{⊤} x_{i}) .$
The proposed RDR estimator is ${\hat{Δ}}_{rdr} = {\hat{E}}_{\hat{S}} (Y_{1}) - {\hat{E}}_{\hat{M}} (Y_{0}) .$

3.2. Theory

To facilitate the presentation of the theory, we define the following conditions.

Condition (i): $α$ is consistently estimated and its estimate $\hat{α}$ satisfies $\hat{α} - α = O_{p} (n^{- 1 / 2})$ .

Condition (ii): $β$ and $γ$ are consistently estimated and their estimates $\hat{β}$ and $\hat{γ}$ satisfy $\hat{β} - β = O_{p} (n^{- 1 / 2})$ and $\hat{γ} - γ = O_{p} (n^{- 1 / 2})$ .

We will defer some discussions on how to consistently estimate these parameters $α$ , $β$ and $γ$ to Section 5. In general, interested readers should refer to the literature on semiparametric modelling techniques, including the single-index models and the dimension reduction techniques.

Other than Assumptions 2.1 and 2.2, our theory also requires the following regularity conditions.

The kernel function in $K_{H} (\cdot)$ satisfies $\int u K (u) d u = 0$ , $\int u u^{⊤} K (u) d u = γ_{K} I$ for $γ_{K} \leq \infty$ , and $\int K^{2} (u) d u = τ_{K}$ for $τ_{K} \leq \infty$ .
The density of $X$ is bounded away from 0.
As $n \to \infty$ , the bandwidth matrix $H \to 0$ such that $\sqrt{tr (H H^{⊤})} \to 0$ and $n \det (H) \to \infty$ , where $tr$ is the trace of a matrix.
$E {| X - E (X) |^{k}} < \infty$ for some k>6.

Theorem 3.1

Relaxed Double Robustness

Given Assumptions 2.1 and 2.2 and the above regularity conditions, if either Condition (i) or Condition (ii) is satisfied, then the proposed estimator ${\hat{Δ}}_{rdr}$ is asymptotically consistent; that is, ${\hat{Δ}}_{rdr} ⟹ p Δ$ as $n \to \infty$ .

Theorem 3.2

Semiparametric Efficiency

Given Assumptions 2.1 and 2.2 and the above regularity conditions, if both Condition (i) and Condition (ii) are satisfied, then the proposed estimator ${\hat{Δ}}_{rdr}$ achieves the semiparametric efficiency; that is $\sqrt{n} ({\hat{Δ}}_{rdr} - Δ) ⟹ d N {0, E (ϕ^{2})},$ where $ϕ = \frac{t}{π (α^{⊤} x)} {y_{1} - μ_{1} (β^{⊤} x)} - \frac{1 - t}{1 - π (α^{⊤} x)} {y_{0} - μ_{0} (γ^{⊤} x)} + μ_{1} (β^{⊤} x) - μ_{0} (γ^{⊤} x) - Δ$ is the EIF for estimating Δ under the scenario that both propensity score and outcome mean are single index models.

The proofs of Theorems 3.1 and 3.2 are contained in the Appendix.

4. Simulation studies

In this section, we conduct comprehensive simulation studies to illustrate the proposed RDR estimator, as well as its comparison with the oracle estimator (Oracle, using all simulated $Y_{1}$ and $Y_{0}$ data), the regression based estimator (REG), the IPW estimator (IPW) and the traditional DR estimator (DR).

We consider the following three settings pertaining to the misspecification of the nuisance functional forms.

Setting 1: neither $π (\cdot)$ nor $μ_{t} (\cdot)$ , t = 1, 0 is correctly specified;
Setting 2: only $π (\cdot)$ is correctly specified;
Setting 3: only $μ_{t} (\cdot)$ , t = 1, 0 is correctly specified.

We expect that, in every setting, the RDR would perform similarly as the Oracle. Because of the constraints on the correct model specification, we expect that all three methods REG, IPW and DR would be biased in Setting 1, REG would be biased in Setting 2, and IPW would be biased in Setting 3.

In each setting, we consider four cases with sample size n = 1000 and n = 2000 blended with independent and correlated covariates: $\begin{aligned} X \sim N ([\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}], [\begin{matrix} 0.5 \\ 0.5 \\ 0.5 \\ 0.5 \end{matrix}]), and \\ X \sim N ([\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}], [\begin{matrix} 0.5 & {0.5}^{2} & {0.5}^{3} & {0.5}^{4} \\ {0.5}^{2} & 0.5 & {0.5}^{2} & {0.5}^{3} \\ {0.5}^{3} & {0.5}^{2} & 0.5 & {0.5}^{2} \\ {0.5}^{4} & {0.5}^{3} & {0.5}^{2} & 0.5 \end{matrix}]) . \end{aligned}$ For Setting 1, we consider the nuisance models $\begin{aligned} π (x) = probit (2 (α^{⊤} x)^{2} - 1.3 | α^{⊤} x |), \\ μ_{1} (x) = \frac{β^{⊤} x}{0.5 + (β^{⊤} x + 1.5)^{2}}, \\ μ_{0} (x) = \frac{γ^{⊤} x}{0.5 + (γ^{⊤} x + 1)^{2}}, \end{aligned}$ where $α = (1, - 0.5, 1, - 0.5)^{⊤}$ , $β = (- 1, - 1, - 0.5, 1)^{⊤}$ and $γ = (0.5, 1, 1, - 1)^{⊤}$ . For Setting 2, we consider $π (x) = logit (α^{⊤} x),$ and the same outcome means models as Setting 1. For Setting 3, we consider $\begin{aligned} π (x) = probit ((α^{⊤} x)^{2} - | α^{⊤} x |), \\ μ_{1} (x) = β^{⊤} x, \\ μ_{0} (x) = γ^{⊤} x . \end{aligned}$ For every specific case, we use the sliced inverse regression (SIR) method to obtain $\hat{α}$ , $\hat{β}$ and $\hat{γ}$ for the proposed RDR estimator. For the traditional REG, IPW and DR methods, we simply fit logistic regression for $π (x)$ and linear regression for $μ_{t} (x)$ , t = 1, 0. We conduct 500 Monte Carlo simulations and report the empirical bias and empirical mean squared error in Tables , and , respectively.

The results of the simulation Setting 1 are shown in Table . We could see that when both models are misspecified, the relaxed doubly robust estimator has the smallest biases for $E (Y_{1})$ , $E (Y_{0})$ , and Δ among all these estimators. However, the regression, IPW, and DR estimators are significantly biased for these estimands. Also, the mean squared errors of the RDR estimator are quite close to that of the oracle estimator under different scenarios, but other estimators are not.

From Table , we could check the numerical results for our simulation Setting 2. When the regression models are misspecified, the regression estimator has significantly larger biases than other estimators while the IPW and DR estimators have small biases, which echoes with our expectation. Our RDR estimator achieves relatively smaller biases and MSEs than the regression estimators, and its MSEs are quite close to that of the oracle estimator, which means that the performance of the RDR estimator is quite good under this condition.

Finally, we can find the similar results in Table for simulation Setting 3. When regression models are correctly specified, the regression and DR estimators both achieve relatively small biases and mean square errors. The IPW estimator has large biases and MSEs due to the model misspecification. Our RDR estimator achieves small biases and mean square errors that are close to the oracle estimator, which means that it has good performance when the propensity score model is misspecified.

5. Discussion

In this paper, we propose a relaxed doubly robust (RDR) estimator for estimating the average treatment effect under the scenario that both propensity score and outcome mean are single index models. In contrast to the traditional doubly robust (DR) estimation, our new estimator only necessitates partial correctness in model specification. It only requires consistent estimation of the unknown parameter in either the propensity score model or the outcome mean model, or both, to achieve double robustness and semiparametric efficiency, respectively. The correct specification of unknown functions in these single-index models is no longer a need. This is why we term it ‘relaxed double robustness’.

To implement the proposed RDR estimator, one needs to consistently estimate $α$ , $β$ and $γ$ in a semiparametric modelling framework. This is a stand-alone problem that was fortunately well studied in the literature. One can either go with standard semiparametric methods in single-index models (Ichimura, Citation1993; Klein & Spady, Citation1993) or choose to estimate these unknown parameters in the context of dimension reduction (Cook, Citation2009; Ma & Zhu, Citation2012, Citation2013). In our numerical studies, we choose the standard sliced inverse regression (Li, Citation1991) in dimension reduction to estimate these parameters $α$ , $β$ and $γ$ .

Finally, as the Associate Editor suggested, we point out that different relaxed doubly robust estimations also exist. For instance, after estimating the unknown parameters $α$ , $β$ and $γ$ same as what we propose, one may opt to using some one-dimensional nonparametric regression method such as kernels to estimate $\hat{π} (\cdot)$ , ${\hat{μ}}_{1} (\cdot)$ and ${\hat{μ}}_{0} (\cdot)$ respectively. It will ultimately lead to the estimator $\frac{1}{n} \sum_{i = 1}^{n} [\frac{t_{i}}{\hat{π} ({\hat{α}}^{⊤} x_{i})} {y_{1 i} - {\hat{μ}}_{1} ({\hat{β}}^{⊤} x_{i})} - \frac{1 - t_{i}}{1 - \hat{π} ({\hat{α}}^{⊤} x_{i})} {y_{0 i} - {\hat{μ}}_{0} ({\hat{γ}}^{⊤} x_{i})} + {\hat{μ}}_{1} ({\hat{β}}^{⊤} x_{i}) - {\hat{μ}}_{0} ({\hat{γ}}^{⊤} x_{i})],$ which is also relaxed doubly robust. This estimator and the proposed estimator have different ideas. The proposed RDR estimator achieves the double robustness by fitting two augmented outcome mean models $E (Y_{1} ∣ {\hat{α}}^{⊤} x, {\hat{β}}^{⊤} x)$ and $E (Y_{0} ∣ {\hat{α}}^{⊤} x, {\hat{γ}}^{⊤} x)$ ; see the details in steps (iii)-(v) of the algorithm in Section 3.1. To the contrary, the aforementioned estimator achieves the double robustness because it was directly implemented according to the efficient influence function.

Acknowledgements

The authors would like to thank the Editor, an Associate Editor, and one reviewer for their insightful comments which have helped improve the manuscript substantially.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research is supported in part by US National Science Foundation [Grant Numbers DMS 1953526, 2122074, 2310942], US National Institutes of Health [Grant Number R01DC021431] and the American Family Funding Initiative of UW-Madison.

References

Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973. https://doi.org/10.1111/biom.2005.61.issue-4
PubMed Web of Science ®Google Scholar
Bickel, P. J., Klaassen, J., Ritov, Y., & Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore.
Google Scholar
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89(425), 81–87. https://doi.org/10.1080/01621459.1994.10476448
Web of Science ®Google Scholar
Cook, R. D. (2009). Regression graphics: Ideas for studying regressions through graphics. John Wiley & Sons.
Google Scholar
Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66(2), 315–331. https://doi.org/10.2307/2998560
Web of Science ®Google Scholar
Hu, Z., Follmann, D. A., & Qin, J. (2012). Semiparametric double balancing score estimation for incomplete data with ignorable missingness. Journal of the American Statistical Association, 107(497), 247–257. https://doi.org/10.1080/01621459.2012.656009
Web of Science ®Google Scholar
Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1–2), 71–120. https://doi.org/10.1016/0304-4076(93)90114-K
Web of Science ®Google Scholar
Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics, 86(1), 4–29. https://doi.org/10.1162/003465304323023651
Web of Science ®Google Scholar
Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
Google Scholar
Klein, R. W., & Spady, R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica: Journal of the Econometric Society, 61(2), 387–421. https://doi.org/10.2307/2951556
Web of Science ®Google Scholar
Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316–327. https://doi.org/10.1080/01621459.1991.10475035
Web of Science ®Google Scholar
Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23(19), 2937–2960. https://doi.org/10.1002/sim.v23:19
PubMed Web of Science ®Google Scholar
Ma, Y., & Zhu, L. (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 107(497), 168–179. https://doi.org/10.1080/01621459.2011.646925
PubMed Web of Science ®Google Scholar
Ma, Y., & Zhu, L. (2013). A review on dimension reduction. International Statistical Review, 81(1), 134–150. https://doi.org/10.1111/insr.2013.81.issue-1
PubMed Web of Science ®Google Scholar
Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences agaricales: Essay de principle. english translation of excerpts by Dabrowska, D. and Speed, T. Statistical Science, 5(4), 465–472.
Google Scholar
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866. https://doi.org/10.1080/01621459.1994.10476818
Web of Science ®Google Scholar
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429), 106–121. https://doi.org/10.1080/01621459.1995.10476493
Web of Science ®Google Scholar
Rosenbaum, P. R. (2002). Overt bias in observational studies. Springer.
Google Scholar
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41
Web of Science ®Google Scholar
Rotnitzky, A., Robins, J. M., & D. O. Scharfstein (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93(444), 1321–1339. https://doi.org/10.1080/01621459.1998.10473795
Web of Science ®Google Scholar
Rotnitzky, A., & Vansteelandt, S. (2014). Double-robust methods. In Handbook of Missing Data Methodology (pp. 185–212). CRC Press.
Google Scholar
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
Web of Science ®Google Scholar
Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593.
Web of Science ®Google Scholar
Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 1096–1120. https://doi.org/10.1080/01621459.1999.10473862
Web of Science ®Google Scholar
Tsiatis, A. A. (2006). Semiparametric theory and missing data. Springer.
Google Scholar

Appendix

Proof

Proof of Theorem 3.1

First, we can prove that

\hat{E} (Y_{1} | S)

, i.e.,

\hat{E} (Y_{1} | α^{⊤} X, β^{⊤} X)

, is consistent for

E (Y_{1} | S)

if either of the following conditions is satisfied.

Condition (a): $α$ is correctly specified, where $α$ is the true coefficient for the propensity score model;
Condition (b): $β$ is correctly specified, where $β$ is the true coefficient for the linear model of $Y_{1}$ on $X$ .

If $α$ is correctly specified, the propensity score $π (X)$ should be some function of $α^{⊤} X$ with probability 1, which means that $π (X)$ could also be written as a function of $S = (α^{⊤} X, β^{⊤} X)$ . Denote this function by f, and we have $π (X) = f (S) with probability 1 .$ If $β$ is correctly specified, $E (Y_{1} | X)$ should be some function of $β^{⊤} X$ with probability 1, which means that $E (Y_{1} | X)$ could also be written as a function of $S = (α^{⊤} X, β^{⊤} X)$ . Denote this function by g, and we have $E (Y_{1} | X) = g (S) with probability 1 .$ Note that the following equation holds if $β$ is correctly specified (A1) $E (Y_{1} | S) = E {E (Y_{1} | X) | S} = g (S) = E (Y_{1} | X) .$ (A1) For the estimation in the algorithm, we estimate $E (Y_{1} | α^{⊤} x_{i}, β^{⊤} x_{i})$ by a Nadaraya-Watson kernel regression of $Y_{1}$ on $\hat{S}$ . $\begin{aligned} \hat{E} (Y_{1} | α^{⊤} x_{i}, β^{⊤} x_{i}) & = \sum_{j = 1}^{n} \frac{T_{j} K_{H} (S_{i} - S_{j})}{\sum_{k = 1}^{n} T_{k} K_{H} (S_{i} - S_{k})} Y_{1 j} \\ = \frac{n^{- 1} \sum_{j = 1}^{n} T_{j} Y_{1 j} K_{H} (S_{i} - S_{j})}{n^{- 1} \sum_{k = 1}^{n} T_{k} K_{H} (S_{i} - S_{k})} . \end{aligned}$ Under the regularity conditions, we can show that $\hat{E} (Y_{1} | α^{⊤} x_{i}, β^{⊤} x_{i})$ converges in probability to $\frac{E (T Y_{1} | S_{i})}{E (T | S_{i})} = \frac{E {E (T Y_{1} | X_{i}) | S_{i}}}{E {E (T | X_{i}) | S_{i}}} = \frac{E {π (X_{i}) E (Y_{1} | X_{i}) | S_{i}}}{E {π (X_{i}) | S_{i}}},$ which can be shown equal to $E (Y_{1} | S_{i})$ under either Condition (a) or Condition (b).

When Condition (a) is satisfied, i.e., $α^{⊤} X$ is correctly specified, we have shown that $π (X_{i}) = f (S_{i})$ with probability 1, whereby we can further simplify the limit above. $\begin{aligned} \frac{E {π (X_{i}) E (Y_{1} | X_{i}) | S_{i}}}{E {π (X_{i}) | S_{i}}} \\ = \frac{f (S_{i}) E {E (Y_{1} | X_{i}) | S_{i}}}{f (S_{i})} \\ = E {E (Y_{1} | X_{i}) | S_{i}} \\ = E (Y_{1} | S_{i}) from the law of total expectation . \end{aligned}$ When Condition (b) is satisfied, i.e., $β^{⊤} X$ is correctly specified, we have shown that $E (Y_{1} | X_{i}) = g (S_{i})$ with probability 1. The limit could be further simplified as follows. $\begin{aligned} \frac{E {π (X_{i}) E (Y_{1} | X_{i}) | S_{i}}}{E {π (X_{i}) | S_{i}}} \\ = \frac{g (S_{i}) E {π (X_{i}) | S_{i}}}{E {π (X_{i}) | S_{i}}} \\ = g (S_{i}) \\ = E (Y_{1} | S_{i}) from Equation (A1) . \end{aligned}$ Thus, $\hat{E} (Y_{1} | α^{⊤} x_{i}, β^{⊤} x_{i})$ is consistent for $E (Y_{1} | α^{⊤} x_{i}, β^{⊤} x_{i})$ , i.e., $\hat{E} (Y_{1} | S_{i})$ is consistent for $E (Y_{1} | S_{i})$ when either $α^{⊤} X$ or $β^{⊤} X$ is correctly specified.

Similarly, we could also show that $\hat{E} (Y_{0} | α^{⊤} x_{i}, γ^{⊤} x_{i})$ is consistent for $E (Y_{0} | α^{⊤} x_{i}, γ^{⊤} x_{i})$ , i.e., $\hat{E} (Y_{0} | M_{i})$ is consistent for $E (Y_{0} | M_{i})$ when either $α^{⊤} X$ or $γ^{⊤} X$ is correctly specified.

Combining these two consistencies, we know that ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ is a consistent estimator for Δ when either Condition (1) or Condition (2) is satisfied.

Note that the proposed estimator ${\hat{Δ}}_{rdr}$ is ${\hat{E}}_{\hat{S}} (Y_{1}) - {\hat{E}}_{\hat{M}} (Y_{0})$ instead of ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ . Next, we want to show that ${\hat{E}}_{\hat{S}} (Y_{1})$ and ${\hat{E}}_{\hat{M}} (Y_{0})$ are asymptotically equivalent to ${\hat{E}}_{S} (Y_{1})$ and ${\hat{E}}_{M} (Y_{0})$ respectively under certain conditions, whereby the proposed estimator ${\hat{Δ}}_{rdr}$ is asymptotically equivalent to ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ , and has the same double robustness.

Define $ζ = (α^{⊤}, β^{⊤})^{⊤}$ and $\hat{ζ} = ({\hat{α}}^{⊤}, {\hat{β}}^{⊤})^{⊤}$ . Thus, $S = b (X_{i}; ζ)$ and $\hat{S} = b (X_{i}; \hat{ζ})$ , where b is a continuous function of $X$ .

Define $φ (X; ζ) = ∂b (X; ζ) / \partial ζ - E {∂b (X; ζ) / \partial ζ} = [\begin{matrix} X - E (X) \\ X - E (X) \end{matrix}] .$ We want to show that ${\hat{E}}_{\hat{S}} (Y_{1})$ is asymptotically equivalent to ${\hat{E}}_{S} (Y_{1})$ if $E {| X - E (X) |^{k}} < \infty$ for some k>6 and $\hat{ζ} - ζ = O_{p} (n^{- 1 / 2})$ are satisfied.

To prove the claim above, it is enough to prove that $K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j})$ is asymptotically equivalent to $K_{H} (S_{i} - S_{j})$ under the given conditions.

We can first simplify the bivariate kernel function $K_{H} (u)$ by normalization. Note that if we use $S^{*} = {\hat{Σ}}_{1}^{- 1 / 2} S$ , $S^{*}$ can also preserve the same property as $S$ since there exists a continuous function g such that $f (S) = f ({\hat{Σ}}_{1}^{1 / 2} S^{*}) = g (S^{*})$ . After the linear transformation, $S^{*}$ has the identity covariance, which means that we can simplify the bivariate kernel function to $K_{H} (u) = h_{n}^{- 2} K (\frac{u}{h_{n}}) .$ Thus, we can rewrite the bivariate kernel functions as $K_{H} (S_{i} - S_{j}) = h_{n}^{- 2} K (\frac{S_{i} - S_{j}}{h_{n}})$ and $K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j}) = h_{n}^{- 2} K (\frac{{\hat{S}}_{i} - {\hat{S}}_{j}}{h_{n}}),$ respectively.

Notice that $K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j}) = K_{H} (b (X_{i}; \hat{ζ}) - b (X_{j}; \hat{ζ}))$ is a function of $\hat{ζ}$ . By Taylor's Expansion, it could be further written as $\begin{aligned} K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j}) & = h_{n}^{- 2} [K (\frac{S_{i} - S_{j}}{h_{n}}) + K (\frac{{φ (X_{i}; ζ^{'}) - φ (X_{j}; ζ^{'})}^{⊤} (\hat{ζ} - ζ)}{h_{n}})] \\ = h_{n}^{- 2} K (\frac{S_{i} - S_{j}}{h_{n}} + \frac{{φ (X_{i}; ζ^{'}) - φ (X_{j}; ζ^{'})}^{⊤} (\hat{ζ} - ζ)}{h_{n}}), \end{aligned}$ where $ζ^{'}$ is on the line segment between $\hat{ζ}$ and $ζ$ . Since $E {| φ (X; ζ^{'}) |^{k}} < \infty$ , we have $| φ (X; ζ) | \leq n^{1 / k}$ almost surely by the Borel-Cantelli lemma. Thus, $| φ (X_{j}; ζ^{'}) - φ (X_{i}; ζ^{'}) | \leq 2 n^{1 / k}$ almost surely by the triangle inequality. Combined with the optimal bandwidth $h_{n} \sim n^{- 1 / 3}$ and the condition that $\hat{ζ} - ζ = O_{p} (n^{- 1 / 2})$ , $K (\frac{{φ (X_{i}; ζ^{'}) - φ (X_{j}; ζ^{'})}^{⊤} (\hat{ζ} - ζ)}{h_{n}})$ is $O (n^{1 / k + 1 / 3 - 1 / 2})$ . Since k>6, this term goes to zero, which means that $K_{H} ({\hat{S}}_{i} - {\hat{S}}_{j})$ is asymptotically equivalent to $K_{H} (S_{i} - S_{j})$ .

Thus, that the kernel regression estimator ${\hat{E}}_{\hat{S}} (Y_{1})$ is consistent for ${\hat{E}}_{S} (Y_{1})$ if $E {| X - E (X) |^{k}} < \infty$ for some k>6 and $\hat{ζ} - ζ = O_{p} (n^{- 1 / 2})$ are satisfied.

Similarly, to prove the consistency of ${\hat{E}}_{\hat{M}} (Y_{0})$ , we can define $ζ^{*} = (α^{⊤}, γ^{⊤})^{⊤}$ and ${\hat{ζ}}^{*} = ({\hat{α}}^{⊤}, {\hat{γ}}^{⊤})^{⊤}$ . We can use a similar argument to show that ${\hat{E}}_{\hat{M}} (Y_{0})$ is asymptotically equivalent to ${\hat{E}}_{M} (Y_{0})$ if $E {| X - E (X) |^{k}} < \infty$ for some k>6 and ${\hat{ζ}}^{*} - ζ^{*} = O_{p} (n^{- 1 / 2})$ are satisfied.

Hence, ${\hat{Δ}}_{rdr} = {\hat{E}}_{\hat{S}} (Y_{1}) - {\hat{E}}_{\hat{M}} (Y_{0})$ is asymptotically equivalent to ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ under the given conditions, which indicates that ${\hat{Δ}}_{rdr}$ is asymptotically consistent if either Condition (i) or Condition (ii) is satisfied.

Proof

Proof of Theorem 3.2

To prove the optimal efficiency for ${\hat{Δ}}_{rdr}$ , we need to prove the optimal efficiency for ${\hat{E}}_{\hat{S}} (Y_{1})$ and ${\hat{E}}_{\hat{M}} (Y_{0})$ .

We can first show that if $α$ is correctly specified, ${\hat{E}}_{S} (Y_{1})$ has an asymptotic normal distribution $n^{1 / 2} ({\hat{E}}_{S} (Y_{1}) - E (Y_{1})) \to N (0, var {E (Y_{1} | S)} + E {\frac{var (Y_{1} | S)}{π (X)}}) .$ And then, if $β$ is additionally satisfied, we can further prove that ${\hat{E}}_{S} (Y_{1})$ achieves optimal efficiency.

To prove that $n^{1 / 2} ({\hat{E}}_{S} (Y_{1}) - E (Y_{1}))$ has such asymptotic distribution, we could split it into three parts, $\begin{aligned} n^{1 / 2} ({\hat{E}}_{S} (Y_{1}) - E (Y_{1})) & = n^{1 / 2} [\frac{1}{n} \sum_{i = 1}^{n} {\hat{E} (Y_{1} | S_{i}) - E (Y_{1})}] \\ = n^{1 / 2} A_{n} + n^{1 / 2} B_{n} + n^{1 / 2} C_{n}, \end{aligned}$ where $\begin{aligned} A_{n} & = \frac{1}{n} \sum_{i = 1}^{n} {E (Y_{1} | S_{i}) - E (Y_{1})}, \\ B_{n} & = \frac{1}{n} \sum_{i = 1}^{n} E {\hat{E} (Y_{1} | S_{i}) - E (Y_{1} | S_{i}) | Θ_{i}}, \\ C_{n} & = \frac{1}{n} \sum_{i = 1}^{n} [\hat{E} (Y_{1} | S_{i}) - E (Y_{1} | S_{i}) - E {\hat{E} (Y_{1} | S_{i}) - E (Y_{1} | S_{i}) | Θ_{i}}], \\ Θ_{i} & = {(X_{j}, Y_{1 j}, T_{j}) | j \neq i} . \end{aligned}$ For $A_{n}$ , we can show that $n^{1 / 2} A_{n}$ asymptotically converges to $N (0, E (Y_{1} | S_{i}))$ due to the central limit theorem. For $B_{n}$ , we can first rewrite $\hat{E} (Y_{1} | S_{i})$ as $\begin{aligned} \hat{E} (Y_{1} | S_{i}) & = \sum_{j = 1}^{n} \frac{Y_{1 j} T_{j} K_{H} (S_{i} - S_{j})}{\sum_{k = 1}^{n} T_{k} K_{H} (S_{i} - S_{k})} \\ = \frac{n^{- 1} \sum_{j = 1}^{n} T_{j} Y_{1 j} K_{H} (S_{i} - S_{j})}{n^{- 1} \sum_{k = 1}^{n} T_{k} K_{H} (S_{i} - S_{k})} . \end{aligned}$ When $α$ is correctly specified, $π (X) = f (S)$ with probability 1. The denominator could be further written as $n^{- 1} \sum_{k = 1}^{n} T_{k} K_{H} (S_{i} - S_{k}) = f (S_{i}) + o_{p} (1)$ . Hence, $B_{n}$ could be rewritten as $B_{n} = \frac{1}{n} \sum_{j = 1}^{n} T_{j} E [\frac{K_{H} (S_{i} - S_{j}) {Y_{1} j - E (Y_{1} | S_{i})}}{f (S_{i})} | Θ_{i}] {1 + o_{p} (1)} .$ We can show that $\sqrt{n} (B_{n} - B_{n}^{*}) = o_{p} (1)$ with $B_{n}^{*} = \frac{1}{n} \sum_{j = 1}^{n} T_{j} \frac{Y_{1 j} - E (Y_{1} | S_{i})}{f (S_{j})}$ by Theorem 2.1 from Cheng (Citation1994). Thus, by the central limit, we can show that $n^{1 / 2} B_{n}$ asymptotically converges to $N (0, E {\frac{var (Y | S)}{f (S)}})$ . For $C_{n}$ , we know that $E (C_{n}) = 0$ and $nE (C_{n}^{2}) \leq E [{\hat{E} (Y_{1} | S_{i}) - E (Y_{1} | S_{i})}^{2}] = O ({tr (H H^{⊤})}^{2} + {n \det (H)}^{- 1}) .$ Thus, $C_{n} = o_{p} (n^{- 1 / 2})$ .

Combining all three parts together, we know that we have the following asymptotic distribution: $n^{1 / 2} ({\hat{E}}_{S} (Y_{1}) - E (Y_{1})) \to N (0, var {E (Y_{1} | S)} + E {\frac{var (Y_{1} | S)}{π (X)}}) .$ Note that the asymptotic variance $n [var {{\hat{E}}_{S} (Y_{1})}]$ could be rewritten as $\begin{aligned} n [var {{\hat{E}}_{S} (Y_{1})}] & = var {E (Y_{1} | S)} + E {\frac{var (Y | S)}{π (X)}} \\ = var (Y_{1}) - E [var (Y_{1} | S)] + E {\frac{var (Y_{1} | S)}{π (X)}} \\ = var (Y_{1}) + E [{{π (X)}^{- 1} - 1} var (Y_{1} | S)] . \end{aligned}$ Additionally, if $β$ is correctly specified, we have the equation $E (Y | X) = E (Y | S)$ according to the argument in the proof of Theorem 3.1. From this equation, $var (Y_{1} | S)$ could be further rewritten as $E {var (Y_{1} | X) | S}$ : $\begin{aligned} var (Y_{1} | S) & = E [{Y_{1} - E (Y_{1} | S)}^{2} | S] \\ = E (E [{Y_{1} - E (Y_{1} | S)}^{2} | X] | S) \\ = E {var (Y_{1} | X) | S} . \end{aligned}$ Thus, the second term $E [{{π (X)}^{- 1} - 1} var (Y_{1} | S)]$ could be rewritten according to the equation $E (Y | X) = E (Y | S)$ and the condition that $π (X) = f (S)$ with probability 1. $\begin{aligned} E [{{π (X)}^{- 1} - 1} var (Y_{1} | S)] & = E [{{π (X)}^{- 1} - 1} E {var (Y_{1} | X) | S}] \\ = E [{{f (S)}^{- 1} - 1} E {var (Y_{1} | X) | S}] \\ = E [E {({f (S)}^{- 1} - 1) var (Y_{1} | X) | S}] \\ = E [{{f (S)}^{- 1} - 1} var (Y_{1} | X)] \\ = E [{{π (X)}^{- 1} - 1} var (Y_{1} | X)] . \end{aligned}$ Combining all the equations above, we know that the asymptotic variance could be transformed into the following expression, $n [var {{\hat{E}}_{S} (Y_{1})}] = var (Y_{1}) + E [{{π (X)}^{- 1} - 1} var (Y_{1} | S)] = var (Y_{1}) + E [{{π (X)}^{- 1} - 1} var (Y_{1} | X)] .$ According to Hahn (Citation1998), $var (Y_{1}) + E [{{π (X)}^{- 1} - 1} var (Y_{1} | X)]$ is the optimal efficient variance that a double robust estimator can achieve. It means that when $α$ and $β$ are both correctly specified, ${\hat{E}}_{S} (Y_{1})$ achieves optimal efficiency.

We can use a similar argument to prove that when $α$ and $γ$ are both correctly specified, ${\hat{E}}_{M} (Y_{0})$ achieves optimal efficiency.

Note that the optimal efficiency for ${\hat{E}}_{S} (Y_{1})$ and ${\hat{E}}_{M} (Y_{0})$ indicates that ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ achieves optimal efficiency for Δ when both Condition (1) and Condition (2) are satisfied.

According to the proof of Theorem 3.1, we know that the proposed estimator ${\hat{Δ}}_{rdr} = {\hat{E}}_{\hat{S}} (Y_{1}) - {\hat{E}}_{\hat{M}} (Y_{0})$ is asymptotically equivalent to ${\hat{E}}_{S} (Y_{1}) - {\hat{E}}_{M} (Y_{0})$ under the given conditions. It implies that when both Condition (1) and Condition (2) are satisfied, ${\hat{Δ}}_{rdr}$ also achieves optimal efficiency for Δ.

Relaxed doubly robust estimation in causal inference

Abstract

1. Introduction

2. Review of traditional DR estimation

Ignorability

Overlap

2.1. Limitation of traditional DR estimation

3. Proposed RDR estimation

3.1. Algorithm

3.2. Theory

Relaxed Double Robustness

Semiparametric Efficiency

4. Simulation studies

Table 1. Simulation results for setting 1.

Table 2. Simulation results for setting 2.

Table 3. Simulation results for setting 3.

5. Discussion

Acknowledgements

Disclosure statement

References

Appendix

Proof of Theorem 3.1

Proof of Theorem 3.2

Information for

Open access

Opportunities

Help and information

Relaxed doubly robust estimation in causal inference

Abstract

1. Introduction

2. Review of traditional DR estimation

Ignorability

Overlap

2.1. Limitation of traditional DR estimation

3. Proposed RDR estimation

3.1. Algorithm

3.2. Theory

Relaxed Double Robustness

Semiparametric Efficiency

4. Simulation studies

Table 1. Simulation results for setting 1.

Table 2. Simulation results for setting 2.

Table 3. Simulation results for setting 3.

5. Discussion

Acknowledgements

Disclosure statement

Additional information

Funding

References

Appendix

Proof of Theorem 3.1

Proof of Theorem 3.2

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date