398
Views
0
CrossRef citations to date
0
Altmetric
Articles

Multiply robust estimation for average treatment effect among treated

&
Pages 29-39 | Received 06 Mar 2023, Accepted 10 Nov 2023, Published online: 15 Dec 2023

Abstract

We propose a multiply robust estimator for the Average Treatment Effect Among the Treated (ATT). The proposed estimation procedure can simultaneously accommodate multiple working models for both the propensity score and the conditional mean of the counterfactual outcome given covariates. In addition, it can explicitly balance a set of user-specified moments of the covariate distributions between the treatment groups. The resulting estimator is consistent if any working model is correctly specified. With the data generating process typically unknown for observational studies, the proposed method provides substantial robustness against possible model misspecifications compared to existing estimators of the ATT. Simulation results show the excellent finite sample performance of the proposed estimator.

1. Introduction

In causal inference, since we only observe what happens to an individual under the treatment condition they actually receive, it is generally impossible to estimate the causal effects for individuals. The causal effects one typically considers involve summary statistics of the individual effects across populations or sub-populations of interest. Two widely considered causal summaries are the average treatment effect (ATE) and the average treatment effect among the treated (ATT). The ATE of a treatment relative to a control is the comparison of the mean outcome which had the entire population been treated versus had the entire population been the control. The ATT is the comparison of the mean outcome under treatment among those who are treated with the mean outcome which had the treated subjects received control instead. A study can estimate both ATE and ATT, but one or the other may be better suited for a particular situation. The ATE may be of more interest if each treatment can potentially be offered to every member of the population. Conversely, if the research question focuses on the effectiveness of an alternative treatment were it to replace the standard treatment, and then the ATT may be of more interest because it measures the relative effectiveness of the two treatment options on the population that is receiving the standard treatment.

There has been a large literature on estimating the ATE and the ATT for observational studies, where a typical consideration is confounding in the sense that individual characteristics are related to both the treatment assignment and the outcome of interest. Propensity score (PS) based methods, where the PS is the probability of receiving a treatment given covariates (Rosenbaum & Rubin, Citation1983), are commonly used to deal with confounding and to achieve balance of covariate distributions between different treatment groups. Such methods include PS matching (e.g., Abadie & Imbens, Citation2006; Rosenbaum & Rubin, Citation1985) and weighting (e.g., Hirano et al., Citation2003). Refer to Imbens and Rubin (Citation2015) and Hernán and Robins (Citation2018) for more details. Parametric modelling of the PS is common, especially when the dimension of covariates is moderate to large. Misspecification of the PS model is usually a major concern as it may lead to substantial estimation bias.

To mitigate the impact of misspecification of the PS model, substantial interests have been given to doubly robust estimators, which involve models for both the PS and the conditional mean of the counterfactual outcome, or outcome regression (OR), and remain consistent if either model is correctly specified. The original doubly robust estimator was constructed through augmented inverse probability weighting (AIPW) in the missing data context (Robins et al., Citation1994). Since then, a large number of doubly robust estimators have been proposed in both missing data and causal inference settings (e.g., Bang & Robins, Citation2005; Cao et al., Citation2009; Han, Citation2012; Kang & Schafer, Citation2007; Qin et al., Citation2008; Qin & Zhang, Citation2007; Rotnitzky et al., Citation2012; Tan, Citation2010; van der Laan & Gruber, Citation2010). When both the PS model and the OR model are wrong, these estimators are in general no longer consistent.

As an improvement over double robustness, in the missing data context, Han and Wang (Citation2013) proposed a multiply robust estimation procedure that can simultaneously accommodate multiple working models for both PS and OR, and the resulting estimators are consistent if any of these models is correctly specified. Since the data generating process for observational studies is typically unknown, it is common that several candidate models all seem reasonable yet none rules out the possibility of others, especially when the dimension of covariates is moderate to large. In such a case, the multiply robust method in Han and Wang (Citation2013) provides a useful tool for data analysis with more protection on estimation consistency. Such a method has attracted considerable interest in both missing data and causal inference research (e.g., Chan & Yam, Citation2014; Chan et al., Citation2016; S. Chen & Haziza, Citation2017; Duan & Yin, Citation2017; Han, Citation2014aCitation2014bCitation2016aCitation2016b; Han et al., Citation2019; Li et al., Citation2020). Especially, Wang (Citation2019) extended the multiply robust method to the estimation of the ATE. It is worth pointing out that the term “ multiply robust” has also been used by other authors in different settings with different meanings. For example, in Molina et al. (Citation2017), it refers to estimation consistency when various combinations of the components of a factorized likelihood are correctly modelled, while in Wang and Tchetgen Tchetgen (Citation2018) and Shi et al. (Citation2020) it refers to estimation consistency being achieved across the union of three different observed data models. We use “ multiply robust” to refer to the property that estimation consistency is achieved if one of the multiple working models for the same quantity is correctly specified.

Despite the success in dealing with missing data problems and in estimating the ATE, multiply robust estimators have not been developed for estimating the ATT. In this paper, we construct such a desirable estimator for ATT. In addition to being multiply robust, the proposed estimator can easily achieve certain level of balancing of covariate distributions between treatment groups. Covariate balancing is a highly desired property for causal effect estimation. For the estimation of ATT, Hainmueller (Citation2012) proposed the entropy balancing (EB) method by imposing a set of balance constraints so that certain moments of covariate distributions for different treatment groups match exactly. Zhao and Percival (Citation2017) found that EB implicitly fits a logistic linear regression model for the PS and a linear regression model for the OR, and when either model is correctly specified the EB estimator is consistent. The estimator we propose preserves the same balance of covariate distributions as EB, and in addition it accounts for multiple models for PS and OR simultaneously so that consistency of the resulting estimator is guaranteed if any one model is correctly specified.

The rest of the paper is organized as follows. Section 2 gives the setup and a brief review of some existing methods. Section 3 presents our proposed multiply robust estimator for the ATT. Simulation studies are provided in Section 4, followed by some discussion in Section 5.

2. Setup and some existing methods

Let A denote the treatment indicator, where A = 1 for the treatment of interest and A = 0 for control. Let Y1 and Y0 denote the counterfactual outcomes under treatment and under control, respectively. Let Y denote the observed outcome in the data, for which we make the consistency assumption that Y=AY1+(1A)Y0. Let X denote the pre-treatment covariates in the dataset, including potential confounders. The potential full data is (A,Y1,Y0,X) whereas the observed data is (A,Y,X). The causal effect we are interested in is the ATT τ=E(Y1Y0A=1), which is also τ1,1τ1,0, where τa,b is the mean outcome for subjects who receive treatment a had they instead received treatment b, i.e., τa,b=E[Yb|A=a].

To ensure the identifiability of τ from the observed data, we make some assumptions following the main literature (e.g., Rosenbaum & Rubin, Citation1983). First, we assume that all potential confounders are included in X, or in other words, the treatment assignment process and the counterfactual outcome are independent given all the covariates measured. Second, we assume that every subject has a positive probability of being assigned to either the treatment or the control group. These two assumptions are formalized as follows.

Assumption 2.1

no unmeasured confounders assumption

A(Y0,Y1)X.

Assumption 2.2

strict positivity

0<σ1<P(A=1X)<σ2<1 with probability one for some positive constants σ1 and σ2.

Let π(X)=P(A=1X) denote the PS for treatment assignment. In the following we give a brief review of some widely used estimators of the ATT. Note that τ1,1=E(Y1A=1) can be consistently estimated by the sample average τˆ1,1=(i=1nAiYi)/(i=1nAi) over the treatment group. Therefore, the estimation of the ATT reduces to the estimation of τ1,0=E(Y0A=1), where the counterfactual outcome Y0 is not observable for individuals in the A = 1 group.

One straightforward way to estimate τ1,0 is to impute Y0 for those individuals in the A = 1 group. Suppose d0(γ) is a parametric regression model for E(Y0X) that is parametrized by γ. Because E(Y0X)=E(Y0X,A=0) from Assumption 1 and Y0 is fully observed in the A = 0 group, γ can be estimated by γˆ based on individuals in the control group alone. Also from Assumption 1, we have E(Y0X)=E(Y0X,A=1), and thus the unobserved Y0 in the treatment group can be imputed by d0(γˆ). Therefore, an outcome regression estimator of τ1,0 is τˆ1,0reg=i=1nAid0i(γˆ)/(i=1nAi).

A widely used alternative method is inverse probability weighting (IPW). From Assumption 1, since f(Y1,Y0,XA=1)=π(X)f(Y1,Y0,X)P(A=1)andf(Y1,Y0,XA=0)={1π(X)}f(Y1,Y0,X)P(A=0), where f() is a generic notation for a probability density function, we have (1) f(Y1,Y0,XA=1)=π(X)1π(X)P(A=0)P(A=1)f(Y1,Y0,XA=0).(1) Therefore, an estimator of τ1,0=E(Y0A=1) can be constructed by properly weighting the observed Y0 in the A = 0 group, and this leads to an IPW estimator (2) τˆ1,0ipw=1s=1n(1As){i:Ai=0}πˆ(Xi)1πˆ(Xi)s=1n(1As)/ns=1nAs/nYi=1s=1nAs{i:Ai=0}πˆ(Xi)1πˆ(Xi)Yi,(2) where πˆ(X) is the estimated value of π(X).

Consistency of τˆ1,0reg and τˆ1,0ipw requires correct modelling of E(Y0X) and π(X), respectively. To improve the robustness against possible model misspecifications, the AIPW method combines the models for E(Y0X) and π(X) so that estimation consistency is guaranteed if either model is correctly specified but not necessarily both. The AIPW estimator for τ1,0 is given as τˆ1,0aipw=τˆ1,0reg+1s=1nAs{i:Ai=0}πˆ(Xi)1πˆ(Xi){Yid0i(γˆ)}. To achieve certain covariate balancing, the EB method (Hainmueller, Citation2012) considers matching covariate moments between the treatment and the control groups. Specifically, let wi be a set of positive weights assigned to the control subjects {i:Ai=0}. The EB method imposes covariate balancing constraints (3) wi>0,{i:Ai=0}wi=1,{i:Ai=0}wih(Xi)=1s=1nAs{i:Ai=1}h(Xi)(3) on wi, where h(X) contains some user-specified moments of X. These constraints ensure that certain moments of X exactly match between the two groups. The EB estimator of τ1,0 is τˆ1,0eb={i:Ai=0}wˆeb,iYi where wˆeb,i minimizes {i:Ai=0}wilogwi subject to the constraints in (Equation3). Although no parametric models are explicitly fitted by the EB method, Zhao and Percival (Citation2017) showed that, implicitly, the EB method fits linear regression models for logit{π(X)} and E(Y0X) with components of h(X) as regressors. The EB estimator τˆ1,0eb is consistent if either model is correctly specified, and thus is doubly robust.

3. The proposed multiply robust estimator for ATT

Our goal is to construct an easy-to-implement estimator of the ATT that is multiply robust and achieves explicit covariate balancing as the EB estimator. Let P={π(j)(α(j)):j=1,,J} denote a set of multiple parametric models for π(X) and D0={d0(k)(γ(k)):k=1,,K} a set of multiple parametric models for E(Y0X), where α(j) and γ(k) are the corresponding parameters. α(j) is typically estimated by αˆ(j) that maximizes the binomial likelihood (4) i=1n{πi(j)(α(j))}Ai{1πi(j)(α(j))}1Ai,(4) and γ(k) is typically estimated by γˆ(k) based on individuals in the control group because E(Y0X)=E(Y0X,A=0) from Assumption 1 and Y0 is fully observed in the control group.

From (Equation1) it is easy to see that, for any function b(X), we have (5) E{b(X)A=1}=E{π(X)1π(X)b(X)A=0}P(A=0)P(A=1).(5) In particular, taking b(X)1 gives 1=E{π(X)1π(X)A=0}P(A=0)P(A=1), and thus (6) E{b(X)A=1}=E{b(X)A=1}E{π(X)1π(X)A=0}P(A=0)P(A=1)=E{π(X)1π(X)E{b(X)A=1}A=0}P(A=0)P(A=1).(6) Subtracting (Equation6) from (Equation5) leads to (7) E{π(X)1π(X)[b(X)E{b(X)A=1}]A=0}=0.(7) Our method starts by constructing an empirical version of (Equation7). Let wi be a set of positive weights assigned on the control subjects with Ai=0 such that {i:Ai=0}wi=1, and then an empirical version of (Equation7) is {i:Ai=0}wi{b(Xi)1s=1nAss=1nAsb(Xs)}=0. To achieve multiple robustness so that τ1,0 is consistently estimated when any one model is correctly specified, we take b(X) to be 1/π(j)(α(j)), j=1,,J, and d0(k)(γ(k)), k=1,,K, and consider the constraints on the wi as (8) wi>0,{i:Ai=0}wi=1,{i:Ai=0}wigˆi(αˆ,γˆ)=0,(8) where αˆ=(αˆ(1),,αˆ(J)), γˆ=(γˆ(1),,γˆ(J)) and gˆ(αˆ,γˆ)=(1π(1)(αˆ(1))1s=1nAss=1nAs1πs(1)(αˆ(1))1π(J)(αˆ(J))1s=1nAss=1nAs1πs(J)(αˆ(J))d0(1)(γˆ(1))1s=1nAss=1nAsd0s(1)(γˆ(1))d0(K)(γˆ(K))1s=1nAss=1nAsd0s(K)(γˆ(K))h(X)1s=1nAss=1nAsh(Xs)). Note that here gˆ(αˆ,γˆ) can also contain components based on taking b(X) to be h(X), a vector of user-specified moments of X. This can be used to achieve certain degree of covariate balancing between the treatment and control groups, similar to the EB method.

Subject to the constraints in (Equation8), we consider the weights wˆmr,i on the subjects in the control group {i:Ai=0} that maximize the empirical likelihood {i:Ai=0}wi and propose to estimate τ1,0 by τˆ1,0mr={i:Ai=0}wˆmr,iYi. Following the standard empirical likelihood technique (e.g., Qin & Lawless, Citation1994), we have wˆmr,i=1i=1n(1Ai)11+ρˆgˆi(αˆ,γˆ),for i satisfying Ai=0, where ρˆ solves (9) {i:Ai=0}gˆi(αˆ,γˆ)1+ρgˆi(αˆ,γˆ)=0.(9) For implementation, directly solving (Equation9) for ρˆ is not the ideal way because, as pointed out in Han (Citation2014a), (Equation9) typically has multiple roots but only one of them makes the wˆmr,i positive. We recommend calculating ρˆ by minimizing F(ρ){i:Ai=0}log{1+ρgˆi(αˆ,γˆ)}, which is the negative antiderivative of the left-hand side of (Equation9). As shown in Han (Citation2014a), this is a convex minimization that always has a unique minimizer when 0 is inside the convex hull of {gˆi(αˆ,γˆ):Ai=0}, which indeed holds at least when n is large because of the moment equality (Equation7) and (Equation8) being an empirical version of (Equation7). The unique minimizer of F(ρ) is ρˆ needed for calculating wˆmr,i and can be easily found by a Newton-Raphson-type algorithm. Refer to J. Chen et al. (Citation2002) and Han (Citation2014a) for such an algorithm.

In the following we provide arguments and explanations for the multiple robustness of τˆ1,0mr. We do not include detailed technical conditions and mathematical proofs for these arguments so that the main ideas are easier to follow. To see the consistency of τˆ1,0mr when P contains a correctly specified model for π(X), say π(1)(α(1)) without loss of generality, let ϑˆ=1s=1nAsi=1nAi1πi(1)(αˆ(1)) and define λˆ in such a way that ρˆ1=(λˆ1+1)/(ϑˆ1) and ρˆ1=λˆ1/(ϑˆ1), where ρˆ1 and λˆ1 are the first component of ρˆ and λˆ, respectively, and ρˆ1 and λˆ1 are vectors of the rest components. Then some simple algebra shows that wˆmr,i=ϑˆ1s=1n(1As)πi(1)(αˆ(1))1πi(1)(αˆ(1))11+λˆπi(1)(αˆ(1))1πi(1)(αˆ(1))gˆi(αˆ,γˆ),for i satisfying Ai=0 and (Equation9) becomes an equation for λˆ {i:Ai=0}πi(1)(αˆ(1))1πi(1)(αˆ(1))gˆi(αˆ,γˆ)1+λˆπi(1)(αˆ(1))1πi(1)(αˆ(1))gˆi(αˆ,γˆ)=0. From the Z-estimator theory (e.g., van der Vaart, Citation1998) and because of the moment equality (Equation7), we must have λˆ=Op(n1/2), which leads to wˆmr,i=ϑˆ1s=1n(1As)πi(1)(αˆ(1))1πi(1)(αˆ(1)){1+Op(n1/2)},for i satisfying Ai=0. In other words, when P contains a correctly specified model for π(X), this correct model is implicitly accounted for by the empirical likelihood procedure to make wˆmr,i the form of the IPW weight as used in (Equation2), and this leads to the consistency of τˆ1,0mr: τˆ1,0mr={i:Ai=0}wˆmr,iYipE[(E{1π(X)A=1}1)π(X)1π(X)YA=0]=E[π(X)1π(X)Y0A=0]P(A=0)P(A=1)=E(Y0A=1), where the second last equality follows from E[1π(X)π(X)A=1]=P(A=0)P(A=1) that can be shown by taking b(X)={1π(X)}/π(X) in (Equation5), and the last equality follows from (Equation1).

A difference between the constraints (Equation8) and those used by Han and Wang (Citation2013) and Wang (Citation2019) for estimating ATE is that in (Equation8) it is 1/π(j)(α(j)) to be balanced whereas in Han and Wang (Citation2013) and Wang (Citation2019) it is π(j)(α(j)). The reason for this difference is that, for consistent estimation of ATT, the calibration weight wˆmr needs to implicitly be of the form π(X)/{1π(X)} as in (Equation2) whereas, for consistent estimation of ATE, the calibration weight needs to implicitly be of the IPW form 1/π(X). From the above algebra it is seen that balancing 1/π(j)(α(j)) in (Equation8) ensures π(X)/{1π(X)} implicitly appear, whereas the algebra in Han and Wang (Citation2013) shows that balancing π(j)(α(j)) ensures 1/π(X) implicitly appear. Thus, consistency of our estimator for ATT would not hold if in (Equation8) 1/π(j)(α(j)) is replaced by π(j)(α(j)).

When D0 contains a correctly specified model for E(Y0X), say d0(1)(γ(1)) without loss of generality, the consistency of τˆ1,0mr follows from {i:Ai=0}wˆmr,iYi={i:Ai=0}wˆmr,i{Yid0i(1)(γˆ(1))}+1s=1nAsi=1nAid0i(1)(γˆ(1))=1s=1n(1As){i:Ai=0}Yid0i(1)(γˆ(1))1+ρˆgˆi(αˆ,γˆ)+E{E(Y0X)A=1}+op(1)=E{Y0E(Y0X)1+ρg(α,γ)A=0}+E{E(Y0X,A=1)A=1}+op(1)=E[E{Y0E(Y0X,A=0)1+ρg(α,γ)X,A=0}A=0]+E(Y0A=1)+op(1)=E(Y0A=1)+op(1), where ρ and g(α,γ) are the probability limits of ρˆ and gˆ(αˆ,γˆ), respectively.

In summary, we have the following result on the multiple robustness property for the proposed estimator of the ATT.

Proposition

If P contains a correctly specified model for π(X) or D0 contains a correctly specified model for E{Y0|X}, then 1i=1nAii=1nAiYi{i:Ai=0}wˆiYipE(Y1|A=1)E(Y0|A=1) as n.

The proposed estimator τˆ1,0mr and the EB estimator τˆ1,0eb are both weighted averages of the observed outcomes for the control group subjects, and both are calibration-type estimators originally considered in survey sampling (Deville & Särndal, Citation1992) that were later extended to missing data analysis and causal inference (e.g., Chan & Yam, Citation2014; S. Chen & Haziza, Citation2017; Han & Wang, Citation2013; Kim, Citation2010; Kim & Park, Citation2010; Qin & Zhang, Citation2007; Tan, Citation2010; Wu & Sitter, Citation2001; Zhang et al., Citation2022). Some of the constraints in (Equation8) based on the h(X) component in gˆ(αˆ,γˆ) are actually the constraints (Equation3) used by the EB method, and thus our proposed method achieves the same degree of covariate balancing between the treatment and the control groups. The other constraints in (Equation8) based on parametric models are for multiple robustness purpose.

Note that the constraints in (Equation8) are for estimating ATT and are determined by (Equation7), which balance the control group with the treatment group. This is different from estimating ATE where the constraints balance the control group with the full sample or the treatment group with the full sample (e.g., Fan et al., Citation2023). Adding the latter type of constraints into (Equation8) would not work, since the weight matching control to treatment is different from the weight matching control to the full sample.

Standard error of the proposed estimator is needed to make inference, such as constructing confidence intervals. Due to the presence of multiple models and the lack of knowledge of which one is correct, deriving the asymptotic distribution of the proposed estimator as an approximation to the finite sample distributions is challenging. Therefore, we recommend using bootstrap to calculate the standard error. The excellent performance of the bootstrap method for multiply robust estimators in missing data context has been demonstrated through comprehensive simulation studies (e.g., Han, Citation2014aCitation2014b), and its effectiveness for the proposed estimator will be shown in the next section.

4. Simulation studies

In this section we conduct simulation studies to evaluate the finite sample performance of the proposed multiply robust estimator of the ATT. The simulation setting mimics that in Han and Wang (Citation2013). The data are generated in the following way: XUniform(2.5,2.5), Y0XN{d0(X),4X2+2}, Y1XN{d1(X),4X2+2} and AXBernoulli{π(X)}, where π(X) is either {1+exp(0.8+0.5X0.3X2)}1 or 1exp[exp{0.5+0.5X0.3exp(X)}] and {d0(X),d1(X)} is either (1+2X+3X2,2+3X+3X2) or {1+2X+3exp(X),2+3X+3exp(X)}. Therefore, we have in total four data generating processes. For each of them, we postulate two propensity score models π(1)(α(1))={1+exp(α1(1)+α2(1)X+α3(1)X2)}1 and π(2)(α(2))=1exp[exp{α1(2)+α2(2)X+α3(2)exp(X)}] and two regression models for d0(X), d0(1)(γ(1))=γ1(1)+γ2(1)X+γ3(1)X2 and d0(2)(γ(2))=γ1(2)+γ2(2)X+γ3(2)exp(X).

Tables  and  contain some simulation results summarized based on 1000 replications for n = 300 and n = 800, respectively. To show the flexibility of the proposed procedure in providing a unified framework for constructing estimators, both our proposed estimators based on each possible combination of models from {π(1)(α(1)),π(2)(α(2)),d0(1)(γ(1)),d0(2)(γ(2))} and the corresponding OR, IPW and/or AIPW estimators, when available, are listed for comparison. Each digit of the four-digit number inside an estimator's name, from left to right, indicates if π(1)(α(1)), π(2)(α(2)), d0(1)(γ(1)) and d0(2)(γ(2)) are used, respectively. For example, MR-1101 denotes our proposed multiply robust estimator based on models π(1)(α(1)), π(2)(α(2)) and d0(2)(γ(2)). Compared to the OR, IPW and AIPW estimators, our proposed estimators using the same parametric models have similar numerical performance. The multiple robustness property is well demonstrated by the negligible biases, especially with n = 800, of the proposed estimators MR-1100, MR-0011, MR-1110, MR-1101, MR-1011, MR-0111 and MR-1111, which are consistent under all four data generating processes. The numerical performance of MR-1100 with a small sample size may be occasionally unstable, as shown by the relatively large bias when n = 300 with data generated based on π(X)={1+exp(0.8+0.5X0.3X2)}1, due to a high correlation between π(1)(αˆ(1)) and π(2)(αˆ(2)), but it improves dramatically as the sample size increases to n = 800.

Table 1. Comparison of different estimators based on 1000 replications and n = 300. Each digit of the four-digit number inside an estimator's name, from left to right, indicates if π(1)(α(1)), π(2)(α(2)), d0(1)(γ(1)) and d0(2)(γ(2)) are used, respectively. The results have been multiplied by 100.

Table 2. Comparison of different estimators based on 1000 replications and n = 800. Each digit of the four-digit number inside an estimator's name, from left to right, indicates if π(1)(α(1)), π(2)(α(2)), d0(1)(γ(1)) and d0(2)(γ(2)) are used, respectively. The results have been multiplied by 100.

The estimator EB-1 by balancing only the first moment of X has a very large bias under all four data generating processes. Replacing the exponential tilting with the empirical likelihood, the estimator MR-1 has a significantly improved performance compared to EB-1 when the data are generated based on π(X)={1+exp(0.8+0.5X0.3X2)}1. By balancing the first two moments of X, the EB method implicitly fits linear regression models for logit{π(X)} and d0(X) with regressors X and X2, and this makes the estimator EB-2 consistent under the first three data generating processes and inconsistent under the last one where π(X)=1exp[exp{0.5+0.5X0.3exp(X)}] and {d0(X),d1(X)}={1+2X+3exp(X),2+3X+3exp(X)}. This theoretical conclusion is well confirmed by inspecting the bias of EB-2. The estimator MR-2 with exponential tilting replaced by empirical likelihood is inconsistent under all four data generating processes.

Table  contains a summary of the performance of the bootstrap method for standard error calculation. Due to similarity of the performance under different settings, we only include the results for π(X)={1+exp(0.8+0.5X0.3X2)}1 and {d0(X),d1(X)}=(1+2X+3X2,2+3X+3X2) with n = 300. The results are summarized based on 1000 replications with the bootstrap resampling size 200. It is seen that the average of the standard errors based on boostrap is very close to the empirical standard error. In addition, for the estimators that are consistent under this data generating process, i.e., MR-1000, MR-0010, MR-1100, MR-1010, MR-1001, MR-0110, MR-0011, MR-1110, MR-1101, MR-1011, MR-0111 and MR-1111, the coverage percentage of the 95% confidence intervals constructed based on bootstrap standard errors is close to the nominal level. Both observations show that the bootstrap method is reliable to calculate the standard errors for the proposed estimators.

Table 3. Performance of the bootstrap method in calculating the standard errors for the proposed estimators in the setting of π(X)={1+exp(0.8+0.5X0.3X2)}1 and {d0(X),d1(X)}=(1+2X+3X2,2+3X+3X2). Results are summarized based on 1000 replications with n = 300 and the bootstrap resampling size is 200. Each digit of the four-digit number inside an estimator's name, from left to right, indicates if π(1)(α(1)), π(2)(α(2)), d0(1)(γ(1)) and d0(2)(γ(2)) are used, respectively.

5. Discussion

In this paper we have proposed a multiply robust estimator for the causal effect ATT. The estimator provides more protection on estimation consistency compared to existing doubly robust estimators. It is worth pointing out that, although the proposed method can simultaneously account for multiple working models, there does not have to be multiple models to apply this method. The construction of constraints is very flexible and the method can be applied when only a single working model is available. Therefore, the proposed method provides a unified framework as an alternative to various existing methods, including the IPW and AIPW methods.

A major difference between the proposed method and the EB method is the objective function being optimized. The EB method minimizes the Shannon entropy {i:Ai=0}wilogwi, or equivalently the exponential tilting function in the empirical likelihood literature (e.g., Newey & Smith, Citation2004), whereas we proposed to maximize the empirical likelihood {i:Ai=0}wi. The advantage of considering an empirical likelihood is manifold. First, it improves the robustness of estimation consistency by using constraints based on parametric working models. Although the same constraints can be used by the EB method as well, a correct model does not make the EB estimator consistent, and consistency of the EB estimator is achieved only when logit{π(X)} or E(Y0X) is a linear model with regressors h(X). Second, by dividing the constraints into two parts, one based on parametric working models and one based on moments of X, the proposed method is more flexible when the dimension of X is large. With π(X) and/or E(Y0X) depending on a high dimensional X, consistency of the EB estimator requires a large number of constraints to include all the regressors for logit{π(X)} and/or E(Y0X), and this may jeopardize the numerical performance in practice. The proposed method, on the contrary, has a separate model building step where complex working models can be built and only the ones that are deemed to be close to the truth are used in the constraints. In this case, the parametric working models help to achieve consistency, and thus the other part of constraints based on moments of X can be chosen exclusively for the goal of achieving covariate balancing. This may considerably reduce the number of constraints compared to the EB method and thus improve the numerical performance. Third, it is well known in the empirical likelihood literature (e.g., Newey & Smith, Citation2004) that the empirical likelihood has smaller higher-order bias, which translates to a better numerical performance under a finite sample size, compared to exponential tilting and other alternatives. As an empirical version of the moment equality (Equation7), the constraints in (Equation8) are legitimate, at least when the sample size is large. Therefore, the ‘model misspecification’ problem that typically exists in the empirical likelihood literature under which the performance of empirical likelihood may be worse than that of the exponential tilting (Schennach, Citation2007) is not of a concern here.

Supplemental material

ECA_jasa.bst

Download (26.3 KB)

JASA_manu.sty

Download (3.3 KB)

jasa_harvard.sty

Download (9.2 KB)

Acknowledgments

We thank the Editor, Associate Editor and a referee for their valuable comments that have helped improve the quality of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Abadie, A., & Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74(1), 235–267. https://doi.org/10.1111/ecta.2006.74.issue-1
  • Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973. https://doi.org/10.1111/biom.2005.61.issue-4
  • Cao, W., Tsiatis, A. A., & Davidian, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika, 96, 723–734. https://doi.org/10.1093/biomet/asp033
  • Chan, K. C. G., & Yam, S. C. P. (2014). Oracle, multiple robust and multipurpose calibration in a missing response problem. Statistical Science, 29(3), 380–396.
  • Chan, K. C. G., Yam, S. C. P., & Zhang, Z. (2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 673–700. https://doi.org/10.1111/rssb.12129
  • Chen, J., Sitter, R. R., & Wu, C. (2002). Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89(1), 230–237. https://doi.org/10.1093/biomet/89.1.230
  • Chen, S., & Haziza, D. (2017). Multiply robust imputation procedures for the treatment of item nonresponse in surveys. Biometrika, 104(2), 439–453.
  • Deville, J., & Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87(418), 376–382. https://doi.org/10.1080/01621459.1992.10475217
  • Duan, X., & Yin, G. (2017). Ensemble approaches to estimating the population mean with missing response. Scandinavian Journal of Statistics, 44(4), 899–917. https://doi.org/10.1111/sjos.v44.4
  • Fan, J., Imai, K., Lee, I., Liu, H., Ning, Y., & Yang, X. (2023). Optimal covariate balancing conditions in propensity score estimation. Journal of Business and Economic Statistics, 41(1), 97–110. https://doi.org/10.1080/07350015.2021.2002159
  • Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20(1), 25–46. https://doi.org/10.1093/pan/mpr025
  • Han, P. (2012). A note on improving the efficiency of inverse probability weighted estimator using the augmentation term. Statistics and Probability Letters, 82(12), 2221–2228. https://doi.org/10.1016/j.spl.2012.08.005
  • Han, P. (2014a). A further study of the multiply robust estimator in missing data analysis. Journal of Statistical Planning and Inference, 148, 101–110. https://doi.org/10.1016/j.jspi.2013.12.006
  • Han, P. (2014b). Multiply robust estimation in regression analysis with missing data. Journal of the American Statistical Association, 109(507), 1159–1173. https://doi.org/10.1080/01621459.2014.880058
  • Han, P. (2016a). Combining inverse probability weighting and multiple imputation to improve robustness of estimation. Scandinavian Journal of Statistics, 43(1), 246–260. https://doi.org/10.1111/sjos.v43.1
  • Han, P. (2016b). Intrinsic efficiency and multiple robustness in longitudinal studies with drop-out. Biometrika, 103(3), 683–700. https://doi.org/10.1093/biomet/asw024
  • Han, P., Kong, L., Zhao, J., & Zhou, X. (2019). A general framework for quantile estimation with incomplete data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(2), 305–333. https://doi.org/10.1111/rssb.12309
  • Han, P., & Wang, L. (2013). Estimation with missing data: Beyond double robustness. Biometrika, 100(2), 417–430. https://doi.org/10.1093/biomet/ass087
  • Hernán, M. A., & Robins, J. M. (2018). Causal inference. Chapman & Hall/CRC.
  • Hirano, K., Imbens, G. W., & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4), 1161–1189. https://doi.org/10.1111/ecta.2003.71.issue-4
  • Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge University Press.
  • Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539.
  • Kim, J. K. (2010). Calibration estimation using exponential tilting in sample surveys. Survey Methodology, 36(2), 145–155.
  • Kim, J. K., & Park, M. (2010). Calibration estimation in survey sampling. International Statistical Review, 78(1), 21–39. https://doi.org/10.1111/insr.2010.78.issue-1
  • Li, W., Yang, S., & Han, P. (2020). Robust estimation for moment condition models with data missing not at random. Journal of Statistical Planning and Inference, 207, 246–254. https://doi.org/10.1016/j.jspi.2020.01.001
  • Molina, J., Rotnitzky, A., Sued, M., & Robins, J. (2017). Multiple robustness in factorized likelihood models. Biometrika, 104, 561–581. https://doi.org/10.1093/biomet/asx027
  • Newey, W. K., & Smith, R. J. (2004). Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica, 72(1), 219–255. https://doi.org/10.1111/ecta.2004.72.issue-1
  • Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1), 300–325. https://doi.org/10.1214/aos/1176325370
  • Qin, J., Shao, J., & Zhang, B. (2008). Efficient and doubly robust imputation for covariate-dependent missing responses. Journal of the American Statistical Association, 103(482), 797–810. https://doi.org/10.1198/016214508000000238
  • Qin, J., & Zhang, B. (2007). Empirical-likelihood-based inference in missing response problems and its application in observational studies. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(1), 101–122. https://doi.org/10.1111/j.1467-9868.2007.00579.x
  • Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866. https://doi.org/10.1080/01621459.1994.10476818
  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41
  • Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33–38.
  • Rotnitzky, A., Lei, Q., Sued, M., & Robins, J. M. (2012). Improved double-robust estimation in missing data and causal inference models.. Biometrika, 99, 439–456. https://doi.org/10.1093/biomet/ass013
  • Schennach, S. (2007). Point estimation with exponentially tilted empirical likelihood. Annals of Statistics, 35(2), 634–672. https://doi.org/10.1214/009053606000001208
  • Shi, X., Miao, W., Nelson, J. C., & Tchetgen Tchetgen, E. (2020). Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2), 521–540. https://doi.org/10.1111/rssb.12361
  • Tan, Z. (2010). Bounded, efficient and doubly robust estimation withinverse weighting. Biometrika, 97(3), 661–682. https://doi.org/10.1093/biomet/asq035
  • van der Laan, M. J., & Gruber, S. (2010). Collaborative double robust targeted maximum likelihood estimation. The International Journal of Biostatistics, 6(1), Article 17.
  • van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge University Press.
  • Wang, L. (2019). Multiple robustness estimation in causal inference. Communications in Statistics - Theory and Methods, 48(23), 5701–5718. https://doi.org/10.1080/03610926.2018.1520881
  • Wang, L., & Tchetgen Tchetgen, E. (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3), 531–550. https://doi.org/10.1111/rssb.12262
  • Wu, C., & Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96(453), 185–193. https://doi.org/10.1198/016214501750333054
  • Zhang, S., Han, P., & Wu, C. (2022). Calibration techniques encompassing survey sampling, missing data analysis and causal inference. International Statistical Review. https://doi.org/10.1111/insr.12518 .
  • Zhao, Q., & Percival, D. (2017). Entropy balancing is doubly robust. Journal of Causal Inference, 5(1). https://doi.org/10.1515/jci-2016-0010