Full article: Maximum Likelihood Estimation of Hierarchical Linear Models from Incomplete Data: Random Coefficients, Statistical Interactions, and Measurement Error

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We consider two-level models where a continuous response R and continuous covariates C are assumed missing at random. Inferences based on maximum likelihood or Bayes are routinely made by estimating their joint normal distribution from observed data $R_{obs}$ and $C_{obs}$ . However, if the model for R given C includes random coefficients, interactions, or polynomial terms, their joint distribution will be nonstandard. We propose a family of unique factorizations involving selected “provisionally known random effects” u such that $h (R_{obs}, C_{obs} | u)$ is normally distributed and u is a low-dimensional normal random vector; we approximate $h (R_{obs}, C_{obs}) = \int h (R_{obs}, C_{obs} | u) g (u) du$ via adaptive Gauss-Hermite quadrature. For polynomial models, the approximation is exact but, in any case, can be made as accurate as required given sufficient computation time. The model incorporates random effects as explanatory variables, reducing bias due to measurement error. By construction, our factorizations solve problems of compatibility among fully conditional distributions that have arisen in Bayesian imputation based on the Gibbs Sampler. We spell out general rules for selecting u, and show that our factorizations can support fully compatible Bayesian methods of imputation using the Gibbs Sampler. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

Large-scale surveys and experiments within the social and health sciences frequently meet four conditions that supply the focus of this article. First, the data typically have a hierarchical structure, with respondents nested within local organizational units such as schools and hospitals or repeated measures nested within persons. Second, missing data are pervasive. Third, partially observed covariates may be measured with error. Finally, the covariates of interest may have random coefficients, statistical interactions or polynomial terms.

These characteristics have received some attention in recent methodological research. A popular approach conceives the response variables and partially observed covariates as outcomes within a multivariate, hierarchical linear model (HLM) under the assumption that the data are missing at random (MAR; Rubin Citation1976), an assumption often thought reasonable given a presumably rich set of covariates (Schafer and Yucel Citation2002; Goldstein, Carpenter, and Browne Citation2014). Missing values are imputed from their posterior predictive distribution of the missing values using what has been termed a “fully conditional specification” (FCS) using the Gibbs sampler. FCS requires the analyst to impute missing values for each variable subject to missingness, conditional on all other unknowns. This approach has been shown to function well when the process generating the joint distribution of unknowns is reasonably assumed multivariate normal. Under the four conditions just described, however, multivariate normality is not possible even if the separate conditional distributions are normal. A concern involves the bias that can arise from the incompatibility between the multiple normal conditional distributions that generate the imputations and the assumed joint distribution of the observed data (Erler Citation2016; Enders, Mistler, and Keller Citation2016; Enders, Du, and Keller Citation2020).

In this article, we propose to address the problem of compatibility (Arnold and Press Citation1989; Liu et al. Citation2014; Bartlett et al. Citation2015) within the framework of maximum likelihood (ML) estimation under the ignorable missing data assumption that data are MAR and that the parameter spaces for the multivariate HLM and the missing data mechanism are distinct (Rubin Citation1976; Little and Rubin Citation2002).

We will first review currently popular methods of inference for normal-theory multilevel models given incomplete data and show how to modify such approaches in which random effects often become explanatory variables, allowing us to model fallible measurement as a source of incomplete data. We’ll then describe our approach when incompletely observed covariates have random coefficients or polynomial terms, including statistical interactions. If a carefully selected subset of these random effects is conditionally assumed known, the model of interest can plausibly follow a normal-theory specification. The “provisionally known random effects (PKREs)” thus, selected can be integrated out of the likelihood using numerical methods. Capitalizing on the invariance properties of ML estimates (MLE), we show that our reparameterized model can be translated back to the original model’s parameter space. Auxiliary variables are introduced to increase the robustness of the MAR assumption. We will illustrate application of the model by studying income inequality and mathematics achievement in U.S. elementary schools. Although we base our case study on MLE, the likelihood factorization we propose can readily be implemented within a Bayesian approach with assurance of compatibility between conditional distributions and the joint distribution of observed data elements. We elucidate general rules for selecting low-dimensional “provisionally constant” random effects for a general class of models involving random coefficients or polynomial terms, including interactions.

Section 2 reviews estimation of normal-theory hierarchical models from data MAR. Section 3 explains how to estimate an analytic hierarchical model with random coefficients and cross-level interaction effects given data MAR via a PKRE. Section 4 describes estimation of the analytic model using auxiliary covariates to strengthen the MAR assumption. Section 5 extends the model with polynomial terms including within-level interactions and spells out rules for selecting provisionally constant random effects. Section 6 illustrates analysis of income inequality in math achievement. Section 7 evaluates our estimators by simulation. Finally, Section 8 discusses the limitations and extensions.

2 Inference for Multivariate Normal HLMs from Incomplete Data Using Random Effects as Predictors

We begin with a review of the normal theory case. Following Schafer and Yucel (Citation2002), our scientific interest focuses on the regression of a response variable $R^{*}$ on covariates $C^{*}$ . The elements of $R^{*}$ and $C^{*}$ , which are continuously measured, are partially observed. Ours is a two-level HLM (Lindley and Smith Citation1972; Dempster, Rubin, and Tsutakawa Citation1981) in which the response variable $R^{*}$ is a characteristic of “level-1 units” (e.g., students) who are clustered within level-2 units (e.g., schools). In contrast, the covariates may be characteristics of either level-1 units or level-2 units. In longitudinal studies, the level-1 units might be repeated occasions of measurements clustered within persons at level 2. Our scientist is primarily interested in the conditional distribution $f_{1} (R^{*} | C^{*})$ assumed normally distributed. However, to account for the missing values in $C^{*}$ , we propose a normal linear model $f_{2} (C^{*})$ . Because of missing data, we cannot separately estimate the parameters of f₁ and f₂ without discarding the cases with missing values on any element of $R^{*}$ or $C^{*}$ . It is well known that a procedure that analyzes only the cases with complete data is prone to bias and/or loss of efficiency (Little and Rubin Citation2002). The cost can be particularly high when a missing item of $C^{*}$ varies at level 2, in which case all of the level-1 units in the level-2 unit having missing values are discarded along with that level-2 unit itself.

To make efficient inference possible, and following Schafer and Yucel (Citation2002), we compose each outcome vector $Y^{*} = {[R^{*} C^{* T}]}^{T}$ and write a multivariate HLM $h (Y^{*})$ (1) $Y^{*} = X^{*} α + Z^{*} b + r^{*} \sim N (X^{*} α, V^{*} = Z^{*} Ω Z^{* T} + Σ^{*})$ (1) where $b \sim N (0, Ω)$ and $r^{*} \sim N (0, Σ^{*})$ . Here, $X^{*}$ and $Z^{*}$ are composed of completely observed covariates; α is a vector of fixed regression coefficients while b and $r^{*}$ are independent random effects that vary at levels 2 and 1, respectively. We partition the complete data (CD) $Y^{*}$ into components $Y^{*} = (Y_{obs}, Y_{mis})$ . In particular, if $Y^{*}$ is N by 1 but we observe only $M (\leq N)$ elements $Y_{obs} = Y$ of $Y^{*}$ , we construct an M-by-N matrix O in which every row contains a single entry equal to unity indicating which value of $Y^{*}$ that is observed. All other entries in the same row are 0. Our model for the observed Y is therefore (2) $Y = X α + Zb + r \sim N (X α, V = Z Ω Z^{T} + Σ)$ (2) where $Y = O Y^{*}, X = O X^{*}, Z = O Z^{*}, r = O r^{*}$ and $Σ = O Σ^{*} O^{T}$ . Assuming data MAR, we can make efficient estimates of the parameters $θ = (α, Ω, Σ^{*})$ using ML or Bayes inference from the observed data according to (1). The most common method in recent literature is a Bayesian approach based on multiple imputation (MI) that we will consider in the final section of this article (see Section 8). However, Schafer and Yucel (Citation2002) showed how one can obtain MLE using the EM algorithm and use the estimates for MI. Shin and Raudenbush (Citation2007), showed how to recover MLE for the analytic model $f_{1} (R^{*} | C^{*})$ by constructing model (1) carefully and estimating the model without a need for imputations. This approach allows some components of $Y^{*}$ to vary at level 2.

2.1 Estimation via the EM Algorithm

The EM algorithm (Dempster, Laird, and Rubin Citation1977) requires evaluation at each iteration m + 1 of the conditional expected CD score given the observed data and parameter estimates at iteration m. To find the CD score, our model for the CD $Y^{*}$ is a multivariate HLM (1). Write model (1) at the level of cluster j and let $φ$ be an arbitrary scalar element of $(Ω, Σ^{*})$ . The CD score equations are well known (see, Raudenbush and Bryk Citation2002, chap. 14): (3) $\begin{matrix} S_{α, CD} = \sum_{j} X_{j}^{* T} V_{j}^{* - 1} (Y_{j}^{*} - X_{j}^{*} α), \\ S_{φ, CD} = \frac{1}{2} \sum_{j} {(\frac{d vec (V_{j}^{*})}{d φ})}^{T} (V_{j}^{* - 1} \otimes V_{j}^{* - 1}) \\ vec [(Y_{j}^{*} - X_{j}^{*} α) {(Y_{j}^{*} - X_{j}^{*} α)}^{T} - V_{j}^{*}] . \end{matrix}$ (3)

The conditional expected score equations given the observed Y_j thus, clearly depend on the conditional mean and variance of $Y_{j}^{*}$ which we readily derive from the fact that $Y_{j}^{* (m + 1)} | Y_{j}, {\hat{θ}}^{(m)} \sim N ({\hat{Y}}_{j}^{* (m + 1)}, {\hat{V}}_{j}^{* (m + 1)})$ given iteration-m estimates ${\hat{θ}}^{(m)} = ({\hat{α}}^{(m)}, {\hat{Ω}}^{(m)}, {\hat{Σ}}^{* (m)})$ (Shin and Raudenbush Citation2007) where (4) $\begin{matrix} {\hat{Y}}_{j}^{* (m + 1)} = X_{j}^{*} {\hat{α}}^{(m)} + {\hat{V}}_{j}^{* (m)} O_{j}^{T} {\hat{V}}_{j}^{(m) - 1} (Y_{j} - X_{j} {\hat{α}}^{(m)}), \\ {\hat{V}}_{j}^{* (m + 1)} = {\hat{V}}_{j}^{* (m)} - {\hat{V}}_{j}^{* (m)} O_{j}^{T} {\hat{V}}_{j}^{(m) - 1} O_{j} {\hat{V}}_{j}^{* (m)} . \end{matrix}$ (4)

2.2 Example 1: Contextual Effects Model

We apply this framework to the “contextual effects model” (Willms Citation1986) for the study of income inequality in the mathematics achievement of U.S. elementary school children. This model decomposes the association between family income and educational achievement into a within-school component and a between-school component. In its simplest form, the model is typically written as (5) $R_{ij} = γ_{00} + γ_{10} (C_{ij} - {\bar{C}}_{j}) + γ_{01} ({\bar{C}}_{j} - \bar{C}) + u_{0 j} + e_{ij}$ (5) where R_ij is a measure of math achievement for student i in school j, C_ij is a measure of the income of that child’s parents, ${\bar{C}}_{j}$ is the sample mean of C_ij within school j, and $\bar{C}$ is the overall sample mean income for $i = 1, \dots, n_{j}$ and $j = 1, \dots, J$ . Here γ₁₀ is known as the “within-school coefficient” while γ₀₁ is the “between school” coefficient; and $u_{0 j}$ and e_ij are independent, normally distributed random effects that vary at levels 2 and 1, respectively. Of interest is the “contextual component” $γ_{c} = γ_{01} - γ_{10}$ which, if positive, suggests that attending a school with high-income peers predicts elevated achievement net of the contribution of a student’s family income.

Two problems arise in the conventional analysis (Shin and Raudenbush Citation2010). First, past analyses have treated parental income as completely observed when, in fact, most surveys report substantial fractions of missing data on income. Second, the sample mean ${\bar{C}}_{j}$ will be a noisy proxy for the actual mean income of parents in a school if the sample size per school is modest, as is the case in most U.S. national surveys. Using random effects as explanatory variables within the MAR framework addresses both issues. Reflecting that income is partially observed, we propose the CD model (6) $R_{ij}^{*} = γ_{00} + γ_{10} ϵ_{ij}^{*} + γ_{01} ν_{j} + u_{0 j} + e_{ij}^{*}, C_{ij}^{*} = δ + ν_{j} + ϵ_{ij}^{*}$ (6) where ν_j and $ϵ_{ij}^{*}$ are independent, normally distributed random effects at levels 2 and 1, respectively. The joint distribution of $R_{ij}^{*}$ and $C_{ij}^{*}$ may be written (7) $\begin{matrix} [\begin{matrix} R_{ij}^{*} \\ C_{ij}^{*} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} γ_{00} \\ δ \end{matrix}] + [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} γ_{01} ν_{j} + u_{0 j} \\ ν_{j} \end{matrix}] \\ + [\begin{matrix} γ_{10} ϵ_{ij}^{*} + e_{ij}^{*} \\ ϵ_{ij}^{*} \end{matrix}] . \end{matrix}$ (7)

Stacking the equations within level-2 unit j, we have the general form of the model (8) $Y_{j}^{*} = X_{j}^{*} α + Z_{j}^{*} b_{j} + r_{j}^{*} \sim N (X_{j}^{*} α, V_{j}^{*} = Z_{j}^{*} Ω Z_{j}^{* T} + I_{n_{j}} \otimes Σ^{*})$ (8) for $X_{j}^{*} = Z_{j}^{*} = 1_{n_{j}} \otimes I_{2}, b_{j} \sim N (0, Ω)$ and $r_{j}^{*} \sim N (0, I_{n_{j}} \otimes Σ^{*})$ where we denote $1_{m}$ as a vector of m unities and I_m an m-by-m identity matrix for a positive integer m. The HLM score equations for θ are familiar (Raudenbush and Bryk Citation2002, chap. 14; Shin and Raudenbush Citation2010) and will therefore not be elaborated here.

2.3 Compatibility

A set of conditional distributions $f_{1} (R^{*} | C^{*}, θ_{1})$ and $f_{2} (C^{*} | R^{*}, θ_{2})$ is said to be compatible if there exist a joint distribution ${h (Y^{*} | θ) : θ \in Θ}$ and surjective maps ${t_{j} : Θ \to Θ_{j} : j = 1, 2}$ such that for each j, $θ_{j} \in Θ_{j}$ and $θ \in t_{j}^{- 1} (θ_{j}) = {θ : t_{j} (θ) = θ_{j}}$ , we have $f_{1} (R^{*} | C^{*}, θ_{1}) = h (R^{*} | C^{*}, θ)$ and $f_{2} (C^{*} | R^{*}, θ_{2}) = h (C^{*} | R^{*}, θ)$ (Liu et al. Citation2014). Assuming a prior $p (θ)$ , Schafer and Yucel (Citation2002) developed the Gibbs sampler based on multivariate normal (MN) $h (Y^{*} | θ)$ that is compatible with an analytic HLM $f_{1} (R^{*} | C^{*}, θ_{1})$ when $C^{*}$ is linearly associated with $R^{*}$ . The MN $h (Y^{*} | θ)$ , however, cannot be compatible with $f_{1} (R^{*} | C^{*}, θ_{1})$ when $C^{*}$ has nonlinear effects (Kim, Sugar, and Belin Citation2015; Enders, Du, and Keller Citation2020). Goldstein, Carpenter, and Browne (Citation2014) and Enders, Du, and Keller (Citation2020) factored $h (Y^{*} | θ) = f_{1} (R^{*} | C^{*}, θ_{1}) f_{2} (C^{*} | θ'_{2})$ and estimated a compatible HLM $f_{1} (R^{*} | C^{*}, θ_{1})$ with the nonlinearities by the Gibbs sampler via a Metropolis algorithm. We extend the HLM further with the nonlinear effects of $C^{*}$ that includes random effects as latent covariates. We estimate $h (Y^{*} | θ)$ efficiently by ML via a PKRE and translate the estimates to the ML estimates of the compatible HLM as explained in the next section.

3 Coping with Random Coefficients and Interactions

Our scientific focus is again on $f_{1} (R^{*} | C^{*}, ν)$ assumed normal in distribution as in (6). However, the model now includes elements of $C^{*}$ that have nonlinearities including random coefficients, polynomial terms, or interactions. In this case, even if we can reasonably assume that $f_{2} (C^{*})$ is normal, the joint distribution $h (Y^{*})$ cannot be normal.

Recent research on the problem of nonlinearities has focused primarily on two widely used methods of analysis of multilevel incomplete data. Perhaps the most popular method of imputation for HLMs under the MAR assumption imputed missing values by a series of sequential univariate regression models (Raghunathan et al. Citation2001), which is also known as MI by fully conditional specification or FCS (van Buuren et al. Citation2006). However, these conditionals will not be compatible with the joint distribution of interest in the presence of the nonlinearities of interest in this article, as shown by Enders, Mistler, and Keller (Citation2016) and Enders, Du, and Keller (Citation2020). The main alternative approach estimates the joint distribution of the outcome and covariates subject to missingness. Missing values may be imputed based on their fully conditional distributions (Liu, Taylor, and Belin Citation2000; Schafer and Yucel Citation2002; Goldstein et al. Citation2009) and by maximum likelihood (Shin and Raudenbush Citation2007, Citation2010; Ren and Shin Citation2016). These normal-theory models were not designed to handle the nonlinearities of interest here. By means of a Gibbs sampler via the Metropolis Hastings algorithm, Goldstein, Carpenter, and Browne (Citation2014) imputed missing values of a response and covariates including interaction and polynomial terms having fixed effects in a multilevel model where covariates and response may be continuous or categorical. Similarly, Erler (Citation2016) took the sequential fully Bayesian approach (Ibrahim, Chen, and Lipsitz Citation2002) that expresses the joint distribution of variables MAR, including the outcome, into a series of univariate conditional models to handle missing values of cluster-level continuous and discrete covariates having fixed effects. Erler et al. (Citation2019) extended the approach to imputing missing values of level-1 covariates having fixed effects. Enders, Du, and Keller (Citation2020) showed, however, that these approaches do not guarantee compatibility when the partially observed covariates have random coefficients. Lacking a formal model for the joint distribution of interest, these approaches appear to fall short of ensuring compatibility.

3.1 Factorization of the Likelihood Based on Provisionally Known Random Effects

To cope with nonlinearities induced by partially observed covariates having random coefficients, polynomials, or interaction terms, we model the joint distribution $h (Y^{*} | ν) p (ν)$ induced by the scientific model of interest $f_{1} (R^{*} | C^{*}, ν)$ and the model for the covariates, $f_{2} (C^{*} | ν) p (ν)$ . The problem is similar to the problem of estimation of generalized linear mixed models (Hedeker and Gibbons Citation1994; Raudenbush, Yang, and Yosef Citation2000). Using the notation of (2), we must evaluate (9) $h (Y) = \int h (Y | b) p (b) db .$ (9)

The integral just defined does not have closed form in the presence of nonlinearities and must be approximated numerically. To facilitate the approximation, we choose a PKRE, call it u such that (10) $h (Y | u) = \int h (Y | b, u) p (b | u) db$ (10)

is a normal HLM discussed in Section 2. The problem of approximation is then to evaluate (11) $h (Y) = \int h (Y | u) g (u) du .$ (11)

The computational challenge is to select u such that the dimension of the analytic integral (10) is maximized while the dimension of numerical approximation (11) is minimized.

3.2 Example 2: A Partially Observed Covariate Having a Random Coefficient

We return to our example of income inequality in U.S. elementary schools. A common finding in multilevel studies of educational achievement is that relationship between student socioeconomic background and achievement varies from school to school (Raudenbush and Bryk Citation1986). This variation could reflect variation in school organization, composition, and resources. To assess the variation in income inequality within schools, we expand the contextual model (6) to allow for a random coefficient (12) $\begin{matrix} R_{ij}^{*} = (γ_{00} + u_{0 j}^{*}) + (γ_{10} + u_{1 j}^{*}) ϵ_{ij}^{*} + e_{ij}^{*}, \\ C_{ij}^{*} = δ + ν_{j} + ϵ_{ij}^{*} \end{matrix}$ (12) where $u_{0 j}^{*} \sim N (0, τ_{00}), u_{1 j}^{*} \sim N (0, τ_{11}), ν_{j} \sim N (0, τ_{ν ν}), cov (u_{0 j}^{*}, u_{1 j}^{*}) = τ_{01}, cov (u_{0 j}, ν_{j}) = τ_{0 ν}$ and $cov (u_{1 j}^{*}, ν_{j}) = τ_{1 ν}$ . This is model (6) for $u_{0 j}^{*} = γ_{01} ν_{j} + u_{0 j}$ if $τ_{11} = 0$ . Let $var (u_{j}^{*}) = τ$ for $u_{j}^{*} = {[u_{0 j}^{*} u_{1 j}^{*} ν_{j}]}^{T}$ . Level-1 random effects $e_{ij}^{*} \sim N (0, σ^{2})$ and $ϵ_{ij}^{*} \sim N (0, σ_{cc})$ are assumed independent of each other and of the level-2 random effects. We denote the parameters of the CD model (12) as $θ_{(12)}^{*} = (γ_{00}, γ_{10}, τ, σ^{2}, δ, σ_{cc})$ .

Clearly $R_{ij}^{*}$ cannot be marginally normal in distribution because of the multiplication of the two normal random effects $u_{1 j}^{*}$ and $ϵ_{ij}^{*}$ . Our strategy is to select one of these two random effects to be considered “provisionally known”; we choose $u_{1 j}^{*}$ for this purpose because it has lower dimension varying across schools than does $ϵ_{ij}^{*}$ which varies across students within each school. Therefore, we write (13) $[\begin{matrix} u_{0 j}^{*} \\ ν_{j} \end{matrix}] | u_{1 j}^{*} \sim N ([\begin{matrix} α_{0 | 1} \\ α_{ν | 1} \end{matrix}] u_{1 j}^{*}, Ω = [\begin{matrix} τ_{00 | 1} & τ_{0 ν | 1} \\ τ_{ν 0 | 1} & τ_{ν ν | 1} \end{matrix}])$ (13) where $α_{k | 1} = τ_{k 1} τ_{11}^{- 1}$ and $τ_{kk' | 1} = τ_{kk'} - α_{k | 1} τ_{11} α_{k' | 1}$ for $k, k' = 0, ν$ . We can therefore write the “provisional” joint model $h (Y_{j}^{*} | u_{1 j}^{*})$ for $Y_{j}^{*} = {[Y_{1 j}^{* T} \dots Y_{n_{j} j}^{* T}]}^{T}$ as (14) $Y_{j}^{*} = X_{j}^{*} α + Z_{j}^{*} b_{j} + r_{j}^{*} \sim N (X_{j}^{*} α, V_{j}^{*} = Z_{j}^{*} Ω Z_{j}^{* T} + Ψ_{j}^{*})$ (14)

where $X_{j}^{*} = 1_{n_{j}} \otimes [I_{2} I_{2} u_{1 j}^{*}], α = {[γ_{00} δ α_{0 | 1} α_{ν | 1}]}^{T}, Z_{j}^{*} = 1_{n_{j}} \otimes I_{2}, b_{j} = {[u_{0 j}^{*} - α_{0 | 1} u_{1 j}^{*} ν_{j} - α_{ν | 1} u_{1 j}^{*}]}^{T} \sim N (0, Ω)$ and $r_{j}^{*} = {[r_{1 j}^{* T} \dots r_{n_{j} j}^{* T}]}^{T} \sim N (0, Ψ_{j}^{*} = I_{n_{j}} \otimes Σ_{j}^{*})$ for (15) $\begin{matrix} r_{ij}^{*} = [\begin{matrix} 1 & γ_{10} + u_{1 j}^{*} \\ 0 & 1 \end{matrix}] [\begin{matrix} e_{ij}^{*} \\ ϵ_{ij}^{*} \end{matrix}], \\ Σ_{j}^{*} = [\begin{matrix} {(γ_{10} + u_{1 j}^{*})}^{2} σ_{cc} + σ^{2} & (γ_{10} + u_{1 j}^{*}) σ_{cc} \\ (γ_{10} + u_{1 j}^{*}) σ_{cc} & σ_{cc} \end{matrix}] . \end{matrix}$ (15)

The CD score is therefore familiar; see (3). To complete the EM algorithm to estimate $h (Y_{j}^{*} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11})$ , we will maximize the likelihood $L (θ) = \prod_{j = 1}^{J} h (Y_{j})$ for (16) $\begin{matrix} h (Y_{j}) = \int h (Y_{j} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11}) d u_{1 j}^{*}, \\ h (Y_{j} | u_{1 j}^{*}) \sim N (X_{j} α, V_{j} = Z_{j} Ω Z_{j}^{T} + Ψ_{j}) \end{matrix}$ (16) where $h (Y_{j} | u_{1 j}^{*})$ is from (14), $X_{j} = O_{j} X_{j}^{*}, Z_{j} = O_{j} Z_{j}^{*}$ and $Ψ_{j} = O_{j} Ψ_{j}^{*} O_{j}^{T} = \oplus_{i = 1}^{n_{j}} Σ_{ij}$ for $O_{j} = \oplus_{i = 1}^{n_{j}} O_{ij}$ and $Σ_{ij} = O_{ij} Σ_{j}^{*} O_{ij}^{T}$ . We use adaptive Gauss-Hermite Quadrature (AGHQ) to numerically approximate integral (16) (Naylor and Smith Citation1982; Pinheiro and Bates Citation1995; Rabe-Hesketh, Skrondal, and Pickles Citation2002).

An additional AGHQ step is needed to complete the E step by evaluating the expectation of a CD score component $S_{CD j}$ of cluster j from (3) (17) $E (S_{CD j} | Y_{j}) = \int \int S_{CD j} f (Y_{j}^{*} | Y_{j}, u_{1 j}^{*}) g (u_{1 j}^{*} | Y_{j}) d Y_{mis j} d u_{1 j}^{*}$ (17) where $f (Y_{j}^{*} | Y_{j}, u_{1 j}^{*})$ has the familiar form of the empirical Bayes posterior normal density with the means and covariance matrix in (4). We use AGHQ to approximate the outer integral with respect to the univariate random effect, $u_{1 j}^{*}$ ; see Appendix C for detail. Because $g (u_{1 j}^{*} | Y_{j})$ is nonstandard, we use the Bayes theorem to find (18) $g (u_{1 j}^{*} | Y_{j}) = \frac{h (Y_{j} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11})}{h (Y_{j})} .$ (18)

By the invariance property of MLE, we translate the MLE of the “provisional” parameters $θ = (α, Ω, γ_{10}, σ_{cc}, σ^{2}, τ_{11})$ of model (14) to a one-to-one transformation ${\hat{θ}}_{(12)}^{*}$ via model (13).

3.3 Example 3: Cross-Level Interaction Effects Involving Partially Observed Covariates

Following Lee and Bryk (Citation1989), we wish to extend the contextual effects model in two ways to allow the level-1 covariate: (i) to have random coefficients as in the previous section; and (ii) to interact with the level-2 covariate. We therefore write the model (19) $\begin{matrix} R_{ij}^{*} = (γ_{00} + γ_{01} ν_{j} + u_{0 j}) + (γ_{10} + γ_{11} ν_{j} + u_{1 j}) ϵ_{ij}^{*} + e_{ij}^{*}, \\ C_{ij}^{*} = δ + ν_{j} + ϵ_{ij}^{*} \end{matrix}$ (19) where $u_{0 j}$ and $u_{1 j}$ are, as before, bivariate normal, but conditional on ν_j with variances $τ_{00 | ν}$ and $τ_{11 | ν}$ , respectively, and covariance $τ_{01 | ν}$ . Other random effects are as defined in model (12). The parameters are $θ_{(19)}^{*} = (γ_{00}, γ_{01}, γ_{10}, γ_{11}, τ_{00 | ν}, τ_{01 | ν}, τ_{11 | ν}, σ^{2}, δ, τ_{ν ν}, σ_{cc})$ .

This model involves two products of normal theory random effects: $ν_{j} ϵ_{ij}^{*}$ and $u_{1 j} ϵ_{ij}^{*}$ . Using the logic of the last section, we wish to choose a PKRE such that, conditional on that effect, our CD joint model will be a normal-theory HLM. The question is how to choose this random effect. We might provisionally hold both ν_j and $u_{1 j}$ constant, but we would prefer to minimize the dimension of the provisionally constant random effects such that the computational burden of numerical approximation is a minimum. Alternatively, we could provisionally hold $ϵ_{ij}^{*}$ constant. But while $ϵ_{ij}^{*}$ is a scalar, it varies across all level-1 units, which could be a very large set. Instead, we choose to provisionally hold constant the scalar level-2 random effect (20) $u_{1 j}^{*} = γ_{11} ν_{j} + u_{1 j} .$ (20)

In addition, we define $u_{0 j}^{*} = γ_{01} ν_{j} + u_{0 j}$ to represent the model parsimoniously as $R_{ij} = (γ_{00} + u_{0 j}^{*}) + (γ_{10} + u_{1 j}^{*}) ϵ_{ij}^{*} + e_{ij}$ . This model is therefore equivalent to model (12) to imply (13)–(18) and a one-to-one correspondence between $θ_{(12)}^{*}$ and $θ_{(19)}^{*}$ .

Consequently, the CD model (12) is a one-to-one transformation of the provisional model $h (Y_{ij}^{*} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11})$ in (14) by distribution (13) and, also, of $h (Y_{ij}^{*} | ν_{j}) ϕ (ν_{j}; 0, τ_{ν ν}) = f_{1} (R_{ij}^{*} | C_{ij}^{*}, ν_{j}) f_{2} (C_{ij}^{*} | ν_{j}) ϕ (ν_{j}; 0, τ_{ν ν})$ in (19) by (20). We choose to estimate joint model (14) via the PKRE for efficient computation that is guaranteed to be compatible with the scientific model $f_{1} (R_{ij}^{*} | C_{ij}^{*}, ν_{j})$ by the one-to-one correspondence. Whereas scientific interest focuses on $θ_{(19)}^{*}$ , we will be estimating the parameters $θ = (α, Ω, γ_{10}, σ_{cc}, σ^{2}, τ_{11})$ of the provisional joint model (14). We then exploit the invariance property of MLE again, translating the MLE of θ back to those of $θ_{(19)}^{*}$ .

The standard approach of replacing ν_j and $ϵ_{ij}^{*}$ with ${\bar{C}}_{j}^{*} - {\bar{C}}^{*}$ and $C_{ij}^{*} - {\bar{C}}_{j}^{*}$ , respectively, in model (19) produces biased estimation of $(γ_{01}, γ_{11}, τ_{00 | ν}, τ_{01 | ν}, τ_{11 | ν})$ even if $R_{ij}^{*}$ and $C_{ij}^{*}$ were fully observed; see Appendix A. Model (19) can readily incorporate multiple covariates having random effects and also having multiple cross-level interactions. Moreover, it is straightforward to include covariates having fixed coefficients. This is important given the need to add auxiliary information to strengthen the robustness of the MAR assumption.

4 Auxiliary Covariates

Our focus is on estimation of income inequality in achievement of model (19). Because certain variables such as family income have severe missing rates, it robustifies the MAR assumption to involve auxiliary covariates (e.g., parent occupation and pretest score) correlated with missing values or patterns (Collins, Schafer, and Kam Citation2003). We consider two approaches to augment auxiliary covariates, partially observed or measured with error, to the CD model. One approach is to assume such covariates to be linearly associated with the outcome and income. Violation of the linearity, however, may produce biased estimation. The other approach is to augment them as responses to $R_{ij}^{*}$ , thereby allowing them to be nonlinearly associated with the outcome and income. We then transform the MLE of the CD model to those of a nested model (19).

4.1 Linearly Associated Auxiliary Covariates

To augment auxiliary covariates that are linearly associated with the outcome and income, we extend the scalar $C_{ij}^{*}$ of model (12) to a vector $C_{ij}^{*} = {[C_{ij}^{*} A_{1 ij}^{* T} A_{2 j}^{* T}]}^{T}$ consisting of income $C_{ij}^{*}$ and auxiliary covariates $A_{1 ij}^{*}$ at level 1 and $A_{2 ij}^{*}$ at level 2. We then write the CD model (21) $\begin{matrix} R_{ij}^{*} = (γ_{00} + u_{0 j}^{*}) + (γ_{10} + u_{1 j}^{*}) ϵ_{ij}^{*} + γ_{20}^{T} ϵ_{1 ij}^{*} + e_{ij}^{*}, \\ C_{ij}^{*} = δ + ν_{j} + ε_{ij}^{*} \end{matrix}$ (21) where $δ = {[δ δ_{1}^{T} δ_{2}^{T}]}^{T}, ν_{j} = {[ν_{j} ν_{1 j}^{T} ν_{2 j}^{T}]}^{T}$ and $ε_{ij}^{*} = {[ϵ_{ij}^{*} ϵ_{1 ij}^{* T} 0^{T}]}^{T} = {[ϵ_{ij}^{*} 0^{T}]}^{T}$ for the means ${[δ_{1}^{T} δ_{2}^{T}]}^{T}$ and school-specific random effects ${[ν_{1 j}^{T} ν_{2 j}^{T}]}^{T}$ of ${[A_{1 ij}^{* T} A_{2 j}^{* T}]}^{T}$ and the child-specific random effects $ϵ_{1 ij}^{*}$ of $A_{1 ij}^{*}$ . Other components γ₀₀, γ₁₀, $u_{0 j}^{*} = γ_{01}^{T} ν_{j} + u_{0 j}, u_{1 j}^{*} = γ_{11}^{T} ν_{j} + u_{1 j}$ and $e_{ij}^{*}$ are defined in model (19) except the linear effects γ₂₀ of $ϵ_{1 ij}^{*}$ . Again $e_{ij}^{*}$ and $ϵ_{ij}^{*} \sim N (0, Σ_{ϵ})$ are independent of each other and $u_{j}^{*} \sim N (0, τ)$ for $Σ_{ϵ} = [\begin{matrix} σ_{cc} & Σ_{c 1} \\ Σ_{1 c} & Σ_{11} \end{matrix}]$ and $u_{j}^{*} = {[u_{0 j}^{*} u_{1 j}^{*} ν_{j}^{T}]}^{T}$ ; let $cov (u_{0 j}^{*}, ν_{j}) = τ_{0 ν}, cov (u_{1 j}^{*}, ν_{j}) = τ_{1 ν}$ and $ν_{j} \sim N (0, τ_{νν})$ . Auxiliary predictors $(ν_{1 j}, ν_{2 j})$ and $ϵ_{1 ij}^{*}$ are linearly associated with the outcome and income at levels 2 and 1, respectively. The parameters are $θ_{(21)}^{*} = (γ_{00}, γ_{10}, γ_{20}, τ, σ^{2}, δ, Σ_{ϵ})$ .

We select $u_{1 j}^{*}$ provisionally known again. Let $Y_{ij}^{*} = {[R_{ij}^{*} C_{ij}^{*} A_{1 ij}^{* T}]}^{T}$ and $A_{2 j}^{*}$ be of respective lengths p₁ and p₂. Using model (13) now for $f (u_{0 j}^{*}, ν_{j} | u_{1 j}^{*})$ and $α_{ν | 1} = {[α_{ν | 1} α_{ν 1 | 1}^{T} α_{ν 2 | 1}^{T}]}^{T}$ , we find the provisional joint model (14) this time for $Y_{j}^{*} = {[Y_{1 j}^{* T} \dots Y_{n_{j} j}^{* T} A_{2 j}^{* T}]}^{T}$ where $\begin{matrix} X_{j}^{*} = diag {{[X_{11 j}^{* T} \dots X_{1 n_{j} j}^{* T}]}^{T}, X_{2 j}^{*}}, \\ α = {[α_{1}^{T} α_{2}^{T}]}^{T}, \\ Z_{j}^{*} = diag {1_{n_{j}} \otimes I_{p_{1}}, I_{p_{2}}}, \\ b_{j} = {[u_{0 j}^{*} - α_{0 | 1} u_{1 j}^{*} ν_{j}^{T} - α_{ν | 1}^{T} u_{1 j}^{*}]}^{T}, \\ r_{j}^{*} = {[r_{1 j}^{* T} \dots r_{n_{j} j}^{* T} 0^{T}]}^{T}, \\ Ψ_{j}^{*} = diag {I_{n_{j}} \otimes Σ_{j}^{*}, 0} \end{matrix}$ for $X_{1 ij}^{*} = [I_{p_{1}} I_{p_{1}} u_{1 j}^{*}], X_{2 j}^{*} = [I_{p_{2}} I_{p_{2}} u_{1 j}^{*}], α_{1}^{T} = [γ_{00} δ δ_{1}^{T} α_{0 | 1} α_{ν | 1} α_{ν 1 | 1}^{T}], α_{2}^{T} = {[δ_{2}^{T} α_{ν 2 | 1}^{T}]}^{T}$ , (22) $\begin{matrix} r_{ij}^{*} = [\begin{matrix} 1 & B_{1 j}^{T} \\ 0 & I_{p_{1} - 1} \end{matrix}] [\begin{matrix} e_{ij}^{*} \\ ϵ_{ij}^{*} \end{matrix}], \\ Σ_{j}^{*} = [\begin{matrix} B_{1 j}^{T} Σ_{ϵ} B_{1 j} + σ^{2} & B_{1 j}^{T} Σ_{ϵ} \\ Σ_{ϵ} B_{1 j} & Σ_{ϵ} \end{matrix}] \end{matrix}$ (22) denoting $B_{1 j}^{T} = [γ_{10} + u_{1 j}^{*} γ_{20}^{T}]$ .

We estimate $θ_{(21)}^{*}$ using its one-to-one transformation $θ = (α, Ω, γ_{10}, γ_{20}, Σ_{ϵ}, σ^{2}, τ_{11})$ of $h (Y_{j}^{*} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11})$ by the EM algorithm, computing $h (Y_{j})$ and $E (S_{CD j} | Y_{j})$ by AGHQ as before. See Appendix B for the E step. The scalar PKRE $u_{1 j}^{*}$ yields efficient computation.

To translate $θ_{(21)}^{*}$ to $θ_{(19)}^{*}$ , we use model (13) to let $β_{kj} = γ_{k 0} + u_{kj}^{*}$ and find $E (β_{kj} | ν_{j}) = γ_{k 0} + γ_{k 1}^{*} ν_{j}$ and $cov (β_{kj}, β_{k' j} | ν_{j}) = τ_{kk'}^{*} = τ_{kk'} - γ_{k 1}^{*} γ_{k' 1}^{*} τ_{ν ν}$ for $γ_{k 1}^{*} = τ_{k ν} / τ_{ν ν}$ and $k, k' = 0, 1$ . We then marginalize auxiliary $ϵ_{1 ij}^{*}$ out to obtain $\begin{matrix} E (R_{ij}^{*} | ϵ_{ij}^{*}, ν_{j}) = γ_{00} + γ_{01}^{*} ν_{j} + (γ_{10} + γ_{11}^{*} ν_{j}) ϵ_{ij}^{*} + γ_{20}^{T} E (ϵ_{1 ij}^{*} | ϵ_{ij}^{*}), \\ var (R_{ij}^{*} | ϵ_{ij}^{*}, ν_{j}) = τ_{00}^{*} + 2 τ_{01}^{*} ϵ_{ij}^{*} + τ_{11}^{*} ϵ_{ij}^{* 2} \\ + γ_{20}^{T} var (ϵ_{1 ij}^{*} | ϵ_{ij}^{*}) γ_{20} + σ^{2} \end{matrix}$ that should be of form $(γ_{00} + γ_{01} ν_{j}) + (γ_{10} + γ_{11} ν_{j}) ϵ_{ij}^{*}$ and $τ_{00 | ν} + 2 τ_{01 | ν} ϵ_{ij}^{*} + τ_{11 | ν} ϵ_{ij}^{* 2} + σ^{2}$ , respectively. See Appendix D for detail. We illustrate this approach in Sections 6 and 7.

4.2 Nonlinearly Associated Auxiliary Covariates

The linearity assumption between $A_{1 ij}^{*}$ and $(R_{ij}^{*}, C_{ij}^{*})$ in model (21) may be violated to produce biased estimation of $θ_{(19)}^{*}$ . In that case, we augment $A_{1 ij}^{*}$ to multivariate $R_{ij}^{*} = {[R_{ij}^{*} A_{1 ij}^{* T}]}^{T}$ of length r, allowing them to be nonlinearly associated in (23) $R_{ij}^{*} = (γ_{00} + u_{0 j}^{*}) + (γ_{10} + u_{1 j}^{*}) ϵ_{ij}^{*} + e_{ij}^{*}, e_{ij}^{*} \sim N (0, Σ_{e})$ (23) and $C_{ij}^{*} = {[C_{ij}^{*} A_{2 j}^{* T}]}^{T}$ in (21) for conformable vectors γ₀₀ and γ₁₀ of fixed effects and random vectors $u_{0 j}^{*}$ and $u_{1 j}^{*}$ independent of $e_{ij}^{*}$ as before. It is straightforward to find the provisional joint model (14) for $Y_{ij}^{*} = (R_{ij}^{*}, C_{ij}^{*})$ and $A_{2 j}^{*}$ , selecting r-by-1 $u_{1 j}^{*}$ to be provisionally known. We estimate the joint model efficiently and, then, translate the bivariate distribution of $R_{ij}^{*}$ and $C_{ij}^{*}$ to $θ_{(19)}^{*}$ ; see Appendix D again for the translation. Numerical approximation is now intensive with respect to vector $u_{1 j}^{*}$ . Finally, some of $A_{1 ij}^{*}$ may be linearly associated with the outcome and income while others may not. We illustrate this case in Data Analysis.

With additional known auxiliary covariates, we marginalize them out first given their expectation and covariance matrix estimated from sample before the translation above.

5 Within-Level Interactions and Polynomial Terms

We write a CD model including the level-2 interaction effects γ₀₄ of $ν_{j} ν_{1 j}$ (24) $\begin{matrix} R_{ij}^{*} = (γ_{00} + γ_{01} ν_{j} + γ_{02}^{T} ν_{1 j} + γ_{03}^{T} ν_{2 j} + γ_{04}^{T} ν_{j} ν_{1 j} + u_{0 j}) \\ + γ_{10}^{T} ϵ_{ij}^{*} + e_{ij}^{*} \end{matrix}$ (24) and $C_{ij}^{*}$ as in model (21) where $u_{0 j} \sim N (0, τ_{00 | ν})$ and $ϵ_{ij}^{*}$ has fixed effects γ₁₀ for simplicity. We select one interactive term ν_j, $\leq$ the other in dimension, provisionally known to find (25) $\begin{matrix} [\begin{matrix} u_{0 j} \\ ν_{1 j} \\ ν_{2 j} \end{matrix}] | ν_{j} \sim N ([\begin{matrix} 0 \\ α_{ν 1 | ν} ν_{j} \\ α_{ν 2 | ν} ν_{j} \end{matrix}], \\ Ω = [\begin{matrix} τ_{00 | ν} & 0 & 0 \\ 0 & τ_{ν 1 ν 1 | c} & τ_{ν 1 ν 2 | c} \\ 0 & τ_{ν 2 ν 1 | c} & τ_{ν 2 ν 2 | c} \end{matrix}]) . \end{matrix}$ (25)

Let $b_{j} = {[u_{0 j} ν_{1 j}^{T} - α_{ν 1 | ν}^{T} ν_{j} ν_{2 j}^{T} - α_{ν 2 | ν}^{T} ν_{j}]}^{T} = {[u_{0 j} b_{1 j}^{T} b_{2 j}^{T}]}^{T}$ to find the model given ν_j (26) $\begin{matrix} R_{ij}^{*} = X_{Rj}^{T} α_{R} + Z_{Rj}^{T} b_{j} + γ_{10}^{T} ϵ_{1 ij}^{*} + e_{ij}^{*} \\ \sim N (X_{Rj}^{T} α_{R}, Z_{Rj}^{T} Ω Z_{Rj} + γ_{10}^{T} Σ_{ϵ} γ_{10} + σ^{2}) \\ C_{ij}^{*} = δ + ν_{j} + ϵ_{ij}^{*}, \\ A_{1 ij}^{*} = δ_{1} + α_{ν 1 | ν} ν_{j} + b_{1 j} + ϵ_{1 ij}^{*}, \\ A_{2 j}^{*} = δ_{2} + α_{ν 2 | ν} ν_{j} + b_{2 j} \end{matrix}$ (26) for $X_{Rj}^{T} = [1 ν_{j} ν_{j}^{2}], α_{R} = {[γ_{00} γ_{01} + γ_{02}^{T} α_{ν 1 | ν} + γ_{03}^{T} α_{ν 2 | ν} γ_{04}^{T} α_{ν 1 | ν}]}^{T}$ and $Z_{Rj}^{T} = [1 γ_{02}^{T} + γ_{04}^{T} ν_{j} γ_{03}^{T}]$ where $C_{ij}^{*}$ varies within but not between clusters. As in Section 4.1, we stack these equations to find the implied provisional model (14) for $Y_{j}^{*} = {[Y_{1 j}^{* T} \dots Y_{n_{j}}^{* T} A_{2 j}^{* T}]}^{T}$ , and compute $h (Y_{j})$ , setting $S_{CD j} = 1$ below, and $E (S_{CD j} | Y_{j})$ by (27) $h (Y_{j}) E (S_{CD j} | Y_{j}) = \int E (S_{CD j} | ν_{j}, Y_{j}) h (Y_{j} | ν_{j}) ϕ (ν_{j}; 0, τ_{cc}) d ν_{j}$ (27) numerically for $h (Y_{j} | ν_{j})$ from the provisional model.

We now consider another CD model including level-1 interaction effects (28) $R_{ij}^{*} = (γ_{00} + γ_{01}^{T} ν_{j} + u_{0 j}) + γ_{10} ϵ_{ij}^{*} + (γ_{20}^{T} + γ_{30}^{T} ϵ_{ij}^{*}) ϵ_{1 ij}^{*} + e_{ij}^{*}$ (28) and $C_{ij}^{*}$ in model (21) where $ϵ_{ij}^{*}$ and $ϵ_{1 ij}^{*}$ have main (γ₁₀ and γ₂₀) and interaction effects (γ₃₀). We select an interactive term $ϵ_{ij}^{*}, \leq ϵ_{1 ij}^{*}$ in dimension, provisionally known again, and find $ϵ_{1 ij}^{*} | ϵ_{ij}^{*} \sim N (α_{ϵ 1 | c} ϵ_{ij}^{*}, Σ_{1 | c})$

for $α_{ϵ 1 | c} = Σ_{1 c} σ_{cc}^{- 1}$ and $Σ_{1 | c} = Σ_{11} - α_{ϵ 1 | c} σ_{cc} α_{ϵ 1 | c}^{T}$ . Given $ϵ_{ij}^{*}$ provisionally constant, let $u_{0 j}^{*} = γ_{01}^{T} ν_{j} + u_{0 j}$ and $a_{1 ij}^{*} = ϵ_{1 ij}^{*} - α_{ϵ 1 | c} ϵ_{ij}^{*}$ to express (29) $\begin{matrix} R_{ij}^{*} = X_{Rij}^{T} α_{R} + u_{0 j}^{*} + B_{1 ij}^{T} a_{1 ij}^{*} + e_{ij}^{*} \\ \sim N (X_{Rij}^{T} α_{R}, τ_{00} + B_{1 ij}^{T} Σ_{1 | c} B_{1 ij} + σ^{2}), \\ C_{ij}^{*} | ϵ_{ij}^{*} \sim N (δ + ϵ_{ij}^{*}, τ_{cc}), \\ A_{1 ij}^{*} = δ_{1} + α_{ϵ 1 | c} ϵ_{ij}^{*} + ν_{1 j} + a_{1 ij}^{*}, \\ A_{2 j}^{*} = δ_{2} + b_{2 j} \end{matrix}$ (29) where $X_{Rij}^{T} = [1 ϵ_{ij}^{*} ϵ_{ij}^{* 2}], α_{R} = [γ_{00} γ_{10} + γ_{20}^{T} α_{ϵ 1 | c} γ_{30}^{T} α_{ϵ 1 | c}], B_{1 ij}^{T} = γ_{20}^{T} + γ_{30}^{T} ϵ_{ij}^{*}, cov (R_{ij}^{*}, A_{1 ij}^{*} | b_{j}, ϵ_{ij}^{*}) = B_{1 ij}^{T} Σ_{1 | c}$ and $C_{ij}^{*}$ now varies between but not within clusters to imply the provisional model (14) given $ϵ_{j}^{*} = (ϵ_{1 j}^{*}, \dots, ϵ_{n_{j} j}^{*})$ for $b_{j} = {[u_{0 j}^{*} ν_{j}^{T}]}^{T}$ . We compute $h (Y_{j})$ and $E (S_{CD j} | Y_{j})$ by (30) $h (Y_{j}) E (S_{CD j} | Y_{j}) = \int E (S_{CD j} | ϵ_{j}^{*}, Y_{j}) h (Y_{j} | ϵ_{j}^{*}) ϕ (ϵ_{j}^{*}; 0, I_{n_{j}} σ_{cc}) d ϵ_{j}^{*}$ (30) numerically for $h (Y_{j} | ϵ_{j}^{*})$ from the provisional model as before. The numerical integral can be computationally intensive, in particular, given large cluster sizes; multivariate Laplace approximation (Pinheiro and Bates Citation1995; Raudenbush, Yang, and Yosef Citation2000) and parallel computation of each cluster may result in efficient computation.

5.1 Rules for Choosing Provisionally Known Random Effects

We now provide general rules for selecting PKREs:

For an interaction $ϵ_{ij}^{*} ϵ_{1 ij}^{*}$ as in (28), we hold $ϵ_{ij}^{*}, \leq ϵ_{1 ij}^{*}$ in dimension, constant. The resulting model quadratic in $ϵ_{ij}^{*}$ will minimize the dimension of AGHQ;
For a level-2 interaction $ν_{j} ν_{1 j}$ , we hold one with a smaller dimension constant again;
For a three-way interaction $ϵ_{ij}^{*} ϵ_{1 ij}^{*} ϵ_{2 ij}^{*}$ at level 1, hold two terms $\leq$ the third one in dimension constant and this applies to a three-way interaction at level 2, too;
For cross-level interactions $ν_{j} ϵ_{ij}^{*}$ as in model (21), hold $u_{1 j}^{*} = γ_{11}^{T} ν_{j} + u_{1 j}$ constant;
For the cluster-specific effects $u_{1 j}^{*}$ of $ϵ_{ij}^{*}$ , we hold $u_{1 j}^{*}$ constant;
Finally, for a subset of these effects, hold constant the union of the PKREs of the subset.

Our CD model in each case includes a scientific model of interest and induces a provisional joint model (14). Because the models are one-to-one transformations of each other, the scientific model is guaranteed to be compatible with the joint model we estimate.

6 Data Analysis

Rising income inequality in the United States and other nations has recently attracted substantial attention (Piketty Citation2014). A key question involves the consequence of such inequality for equality of opportunity among children. Following past research, we decompose the association between family income and educational achievement into a contextual component and a child-specific component (Firebaugh Citation1978; Willms Citation1986; Lee and Bryk Citation1989). The contextual component reflects the fact that elementary schools in the United States are quite segregated based on family income. Such segregation reflects and may reinforce residential segregation as a function of family income. Two children having the same family income might differ in educational achievement as a result of their experience in low-income versus high-income schools. The individual component reflects socioeconomic inequality within schools. Children attending the same school who differ with respect to family income may tend to differ with respect to their achievement. However, the magnitude of this within-school disparity may vary from school to school (Raudenbush and Bryk Citation1986; Lee and Bryk Citation1989). The contextual effects model (Willms Citation1986) supports the composition of inequality in achievement that we seek.

First, we decompose family income for child i in school j into between-school and within-school components as in model (19) $\begin{matrix} ln ({income}_{ij}) = C_{ij}^{*} = δ + ν_{j} + ϵ_{ij}^{*}, \\ math S_{ij} = R_{ij}^{*} = (γ_{00} + γ_{01} ν_{j} + u_{0 j}) \\ + (γ_{10} + γ_{11} ν_{j} + u_{1 j}) ϵ_{ij}^{*} + e_{ij}^{*} \end{matrix}$ for the mean of log-income δ, the school-specific deviation from the mean $ν_{j} \sim N (0, σ_{cc})$ , and the child-specific component $ϵ_{ij}^{*} \sim N (0, σ_{cc})$ . A child’s mathematics achievement in spring 1999 (mathS) depends on these components via the model (19) for $math S_{ij} = R_{ij}^{*}$ . The parameters are $θ_{(19)}^{*} = (γ_{00}, γ_{01}, γ_{10}, γ_{11}, τ_{00 | ν}, τ_{01 | ν}, τ_{11 | ν}, σ^{2}, δ, τ_{ν ν}, σ_{cc})$ .

We choose math achievement as our outcome because of its importance in predicting educational attainment and adult earnings (Nomi and Raudenbush Citation2016; Rivera-Batiz Citation1982). In model (19), γ₀₁ is the between-school gradient, reflecting the expected difference in $R_{ij}^{*}$ associated with a unit difference in school mean income; γ₁₀ reflects the average within-school gradient. However, the within-school gradient may depend on school mean income, an interaction effect represented by γ₁₁, and this gradient may also vary randomly over schools as represented by $u_{1 j} \sim N (0, τ_{11 | ν})$ . School-mean achievement is γ₀₀ and, conditional on income, varies randomly over schools, $u_{0 j} \sim N (0, τ_{00 | ν})$ . If the within-school gradients were constant ( $γ_{11} = τ_{11} = 0$ ), the overall linear coefficient for income will be $E (R_{ij}^{*} | C_{ij}^{*} = c + 1) - E (R_{ij}^{*} | C_{ij}^{*} = c) = ρ γ_{01} + (1 - ρ) γ_{10}$ where $ρ = τ_{cc} / (τ_{cc} + σ_{cc})$ can be regarded as an index of school segregation as a function of income. Define $γ_{c} = γ_{01} - γ_{10}$ as the “contextual effect” (Willms Citation1986), the expected difference in math achievement between two students with the same family income who attend two schools that differ by one unit in school mean income. A nation’s income gradient would be $ρ γ_{c} + γ_{10}$ , which increases with the within school segregation based on income, ρ, the contextual coefficient, γ_c and within-school gradient γ₁₀. This simple relationship will not hold if the within-school gradients vary over schools, and one purpose of our analysis is to test that proposition.

To do so, we use data from 21,211 children attending kindergarten in 1,018 schools as of fall 1998, a nationally representative sample known as the Early Childhood Longitudinal Study of 1998 (“ECLS”) that is publicly available at https://nces.ed.gov/ecls; see . Only 8% of the math achievement data are missing. However, family income data are missing for 32% of the sample, a finding that is quite typical in surveys of educational achievement. Fortunately, ECLS (Tourangeau et al. Citation2009) provides data on auxiliary variables, including the maximum occupational status score of parents (occupation), missing for only 5% of the cases, as well as math achievement in fall 1998 (mathF), which is strongly predictive of math achievement in spring 1999 (mathS). In all, we have four auxiliary variables, correlated with income, outcome or missing patterns.

Table 1 Each variable for analysis with mean (standard deviation (SD), missing %).

Display Table

In this article, we take convergence to ML to be less than $10^{- 4}$ in the square root of the summed squared differences between $\hat{θ}$ of two consecutive iterations. We estimate the model for covariates $C_{ij}^{*}$ using all observed values by ML via the EM algorithm (Shin and Raudenbush Citation2007) and a scientific model for $R_{ij}^{*}$ given the covariates and their sample cluster and overall means by complete case analysis, and transform the estimates to the initial values $\hat{θ}$ of the joint model. We carry out complete case analysis by R (RCoreTeam 2017), estimate θ on a Dell XPS laptop with the 11th generation Intel(R) Core(TM) i9-11900H processor at 2.50GHz and 64 GB RAM, and test a hypothesis at a level $α = 0.05$ .

6.1 Linearly Associated Auxiliary Covariates

Recall from Section 4 that we have two strategies. Following Section 4.1, we model the auxiliary covariates linearly associated with the outcome and income in model (21) where $A_{1 ij}^{*}$ is a vector of mathF, occupation, and age in months at assessment of spring 1999 (age) and $A_{2 j}^{*}$ is the square root of kindergarten enrollment (enrollment) by the Box-Cox transformation. Consequently, $θ_{(21)}^{*} = (γ_{00}, γ_{10}, γ_{20}, τ, σ^{2}, δ, Σ_{ϵ})$ consists of 10 fixed effects (3-by-1 γ₂₀ and 5-by-1 $δ$ ) and 39 variances and covariances (7-by-7 τ and 4-by-4 $Σ_{ϵ}$ ).

We standardized each variable to have mean 0 and variance 1, except income for interpretation. Estimation of $θ_{(21)}^{*}$ with a provisionally known $u_{1 j}^{*}$ and 20 abscissas converged fast to ML in 9 iterations and 13 sec. The transformed estimates ${\hat{θ}}_{(19)}^{*}$ and standard errors (SEs), multiplied by 100, are listed under “EM-AGHQ I” in . The school-mean and within-school components of family income are positively associated with math achievement while their interaction effect is insignificant. The within-school income effects appear to vary at most modestly across schools with the variance estimate ${\hat{τ}}_{11 | ν} = 0.19$ less than the associated SE 0.25. Based on $ln ({\hat{τ}}_{11 | ν}) \sim N [ln (τ_{11 | ν}), var ({\hat{τ}}_{11 | ν}) / τ_{11 | ν}^{2}]$ , we find a large-sample 95% confidence interval (CI) for $τ_{11 | ν}$ : $(0.01, 2.62)$ near zero.

Table 2 Estimates × 100 (standard errors × 100) of model (19) by EM-AGHQ I and II.

Display Table

To test this model against the null hypothesis that $γ_{11} = 0$ and $τ_{11 | ν} = 0$ , we estimated the null model, the multivariate normal distribution of linear associated $(R_{ij}^{*}, C_{ij}^{*}, A_{1 ij}^{*}, A_{2 j}^{*})$ , efficiently by the EM algorithm (Shin and Raudenbush Citation2007, Citation2010). The model consisted of 42 parameters comprising 6 fixed intercepts and 6-by-6 level-2 and 5-by-5 level-1 variance covariance matrices, and converged to log ML -107370.60. Compared to the log ML of $θ_{(21)}^{*}$ displayed at the last row of , the likelihood ratio test statistic to test the two joint models is 19.20 with 7 degrees of freedom to give a conservative p-value < 0.01 (Stram and Lee Citation1994). Therefore, we infer that the outcome is nonlinearly associated with income. Lastly, estimation using 10 abscissas also produced the same estimates under EM-AGHQ I.

6.2 Nonlinearly Associated Auxiliary Covariates

Preliminary analysis indicates that mathF and occupation may be nonlinearly associated with the outcome and income. To test the hypothesis, following Section 4.2, we model multivariate responses $R_{ij}^{*} =$ (mathS, mathF, occupation) and $A_{1 ij}^{*} =$ age in model (21): (31) $R_{ij}^{*} = (γ_{00} + u_{0 j}^{*}) + (γ_{10} + u_{1 j}^{*}) ϵ_{ij}^{*} + γ_{20} ϵ_{1 ij}^{*} + e_{ij}^{*}$ (31) and $C_{ij}^{*} = {[C_{ij} A_{1 ij}^{*} A_{2 j}^{*}]}^{T}$ as before for 3-by-1 vectors γ₀₀, γ₁₀ and γ₂₀ of fixed effects and random vectors $u_{0 j}^{*}, u_{1 j}^{*}$ and $e_{ij}^{*} \sim N (0, Σ_{e})$ . Therefore, τ is 9-by-9, Σ_e 3-by-3 and $Σ_{ϵ}$ 2-by-2; the CD model comprises 12 fixed effects and 54 variances and covariances. With 3-by-1 $u_{1 j}^{*}$ provisionally known, we estimated the joint model using 10 abscissae per dimension which converged to ML in 734th iterations and 928 min. The log ML of $θ_{(31)}^{*}$ is shown at the bottom row under EM-AGHQ II. The LRT statistic to test $H_{0} :$ model (21) versus $H_{1} :$ model (31) is 367.93. The conservative LRT with 17 degrees of freedom (Stram and Lee Citation1994) produces a p-value 0 to reject the null in favor of mathF or occupation nonlinearly associated with the outcome and income.

The translated estimates ${\hat{θ}}_{(19)}^{*}$ and SEs are listed under EM-AGHQ II in . Compared to those under EM-AGHQ I, the main effect of within-school income is larger; furthermore, the interaction effect is significant, and so is the random effects of income by the Wald test to produce a 95% CI for τ₁₁ now distant from zero. We conclude that the linearity assumption associated with (21) is violated to attenuate the main, interaction and random effects of within-school income $ϵ_{ij}^{*}$ , confounded with the auxiliary covariates. As a result, EM-AGHQ II produces a smaller ${\hat{σ}}^{2} = 73.74$ and, thus, explains more outcome variability within schools than does EM-AGHQ I.

6.3 Known Auxiliary Covariates

Either model (21) or (31) may be extended to control for known auxiliary covariates such as race ethnicity and gender at level 1 and school location and sector at level 2. The joint model may be estimated given the provisionally known $u_{1 j}^{*}$ again, and translated to ${\hat{θ}}_{(19)}^{*}$ and SEs. See Appendices B and D for detail.

7 Simulation Study

We focus on ML estimation of the scientific HLM (19) after simulating outcome $R_{ij}^{*}$ and income $C_{ij}^{*}$ from a joint model conditional on auxiliary covariates within which the HLM is nested. The goal is to compare our estimators (EM-AGHQ) with those by four methods: (a) the benchmark method (BM) given ν_j and $ϵ_{ij}^{*}$ ; (b) complete-case analysis (CC) given ${\bar{C}}_{j}^{*} - {\bar{C}}^{*}$ and $C_{ij}^{*} - {\bar{C}}_{j}^{*}$ instead; (c) MLE on MI (Shin and Raudenbush Citation2007, Citation2010); and (d) the Gibbs sampler (GS) of Enders, Du, and Keller (Citation2020) implemented in software Blimp (Keller and Enders Citation2021). BM is based on complete data while others are based on data MAR. Therefore, a good method will produce estimates near the BM counterparts. BM and CC estimate the scientific model by the lme4 package (Bates et al. Citation2015) in R. MLE on MI uses C programs to estimate incompatible MHLM (8) by ML and impute missing values including latent school mean incomes 20 times, more than did past multilevel missing data analyses (Schafer and Yucel Citation2002; Shin and Raudenbush Citation2007, Citation2013; Enders, Du, and Keller Citation2020), from their predictive distribution given observed data implied by the MHLM at ML (Shin and Raudenbush Citation2007, Citation2010); and estimates the HLM given the MI by lme4. GS estimates the joint model, simultaneously generating 20 imputations of missing values excluding latent school mean incomes, by Blimp and, then, the HLM (19) given the MI by Blimp again. EM-AGHQ estimates the joint model by our C program and translates the estimates to the desired MLE in R.

We simulate the ECLS $R_{ij}^{*}$ and $C_{ij}^{*}$ closely in terms of sample sizes, correlations and missing rates. Specifically, for n = 20 children in each of J = 1000 schools, we simulate: (i) known auxiliary covariates $X_{21 j} \sim Bernoulli (0.3)$ and $X_{1 ij} \sim Bernoulli (0.45)$ equal to 1 (0) for a private (public) school and a minority (white) child, respectively, where $X_{1 ij}$ varies within, but not between, schools for simplicity; (ii) random intercepts and slope $β_{0 j} = γ_{00}^{T} X_{2 j} + u_{0 j}^{*}, β_{1 j} = γ_{10}^{T} X_{2 j} + u_{1 j}^{*}$ and $β_{Cj} = δ_{00}^{T} X_{2 j} + ν_{j}$ from $N (1 + X_{21 j}, 1)$ with covariances 0.8 for $X_{2 j} = {[1 X_{21 j}]}^{T}$ ; (iii) independent $e_{ij}^{*}, ϵ_{ij}^{*} \sim N (0, 10)$ to simulate the CD joint model (32) $\begin{matrix} R_{ij}^{*} = β_{0 j} + β_{1 j} (C_{ij}^{*} - β_{Cj}) + γ_{20} X_{1 ij} + e_{ij}^{*}, \\ C_{ij}^{*} = β_{Cj} + δ_{10} X_{1 ij} + ϵ_{ij}^{*} . \end{matrix}$ (32)

The simulated parameters in θ consist of $γ_{00}^{T} = γ_{10}^{T} = δ_{00}^{T} = [11], γ_{20} = δ_{10} = 1$ , variances $τ_{00} = τ_{11} = τ_{ν ν} = 1$ and $σ_{cc} = σ^{2} = 10$ , and covariances $τ_{01} = τ_{0 ν} = τ_{1 ν} = 0.8$ . We marginalize $X_{1 ij}$ and $X_{21 j}$ out given their simulated expectations and variances, and translate θ to the scientific model in column two of as explained in Appendix D.

Table 3 Scientific model (19) estimated by BM, CC, MLE on MI, GS and EM-AGHQ.

Display Table

Next, we simulate the ECLS missing rates closely by (33) $logit (p_{ij}) = ϕ_{1} X_{1 ij} + ϕ_{2} (1 - X_{1 ij}) + z_{j}, z_{j} \sim N (0, 1)$ (33) given the known level-1 covariate $X_{1 ij}$ : missing values drawn from $Bernoulli (p_{ij})$ are MAR. Because the mechanism does not provide information about the simulated model (32), the parameter spaces of the missing data mechanism and joint model are also distinct. We simulate higher missing rates for minority than white students by $ϕ_{1} > ϕ_{2}$ : $ϕ_{1} = - 0.2 > ϕ_{2} = - 1.2$ for income $C_{ij}^{*}$ with a 35% missing rate (46% for $X_{1 ij} = 1$ , 27% for $X_{1 ij} = 0$ ); and $ϕ_{1} = - 2 > ϕ_{2} = - 3$ for response $R_{ij}^{*}$ with an 11% missing rate (16% for $X_{1 ij} = 1$ , 7% for $X_{1 ij} = 0$ ) on average.

We repeated simulating data and estimating the scientific model by the approaches 500 times to compute the % bias, average estimated SE (ASE), empirical estimate of the true SE (ESE) over samples and coverage probability (coverage) of each estimator in the next five columns. Each cell or estimate occupies two rows: % bias (ASE) in the first, and ESE and coverage in the next row. The lme4 package is unable to produce ESE and coverage of a variance or covariance estimate. The BM estimates are of course very accurate and precise with $\leq 0.13$ % bias, small ASE close to ESE, and good coverages near the nominal 0.95 in column three.

The CC estimates in column four, however, are biased despite the large sample sizes. The standard deviations (SDs) $\sqrt{τ_{00 | ν}}$ and $\sqrt{τ_{11 | ν}}$ are 60% and 39% biased upward while the intercept γ₀₀, interaction effect γ₁₁ and covariance $τ_{01 | ν}$ are 8%, 42%, and 29% biased downward, respectively. Only the estimates of γ₀₁, γ₁₀, and $σ^{2}$ are comparable in accuracy to those by BM as (A1) reveals in Appendix A. The coverages are low with a zero coverage for γ₁₁. Finally, the uncertainty associated with the estimator of a cluster-level effect γ₀₁ seems underestimated by ASE smaller than ESE.

MLE on MI generates incompatible MI based on MHLM (8) without consideration of the interaction effect and PKRE, thereby producing all estimates biased in column five that do not seem better than the CC estimates. Fixed effects and covariance are biased downward, and SDs upward. SEs are close to but coverages lower than CC counterparts.

On the contrary, GS without consideration of a PKRE produces estimates in column six nearly as accurate as BM counterparts except the SD of the random slope that is biased upward by 2.69% while EM-AGHQ yields all estimates in the last column as accurate or almost as accurate as BM estimates. Overall, both approaches produce estimates slightly less precise than BM estimates; ASE and ESE appear slightly larger than those by BM to reflect extra uncertainty due to latent covariates and missing values.

Computation

We used 20 abscissas to estimate the joint model by EM-AGHQ. Given data MAR, the estimation converged 450 times taking 101.6 iterations on average and 189 iterations at maximum, but did not converge until and was stopped to produce the estimates at the 300th iteration 50 times (10%). This does not appear to be the weakness of our approach as the convergence issue also occurred to each of BM and CC estimations producing 50 or more warnings of a model failing to converge. The convergence issue seems partly due to high missing rates but few covariates to explain missing values and patterns. In our experience thus far, the convergence rates seem positively associated with more covariates or abscissas. For example, this simulation using 10 abscissas resulted in practically identical estimates, but lowered the convergence rate given data MAR.

Blimp estimates 21 models per simulated dataset: joint model (32) and the HLM given each of 20 imputations. We set 20,000 burn-in and 10,000 post burn-in iterations to estimate the joint model and impute MI, and 5000 burn-in and 5000 post burn-in iterations to estimate the HLM given the MI. These settings are based on preliminary analysis of five simulated datasets that produced the potential scale reduction statistics of all estimates of each model lower than or near 1.1 to imply a reasonable convergence to posterior distributions (Gelman and Rubin Citation1992).

8 Discussion

In this article, we have considered how to estimate a two-level hierarchical linear model (HLM) efficiently where a continuous response $R^{*}$ and continuous covariates $C^{*}$ may be MAR and $C^{*}$ may have interactive, polynomial or randomly varying effects. Nonlinearities of $C^{*}$ imply a nonstandard joint model $h (R^{*}, C^{*}) = h (Y^{*})$ where $Y^{*} = (Y, Y_{mis})$ for observed Y and missing $Y_{mis}$ . The key idea is to introduce a unique factorization of the joint model involving “provisionally known” random effects (PKREs) u such that the observed joint model $h (Y | u) = \int h (Y | ν, u) g (ν | u) d ν$ is an analytically tractable multivariate normal (MN) theory HLM with respect to a high-dimensional random vector ν. We computed the likelihood $h (Y) = \int h (Y | u) g (u) du$ numerically with respect to a low dimensional u by means of adaptive Gauss-Hermite quadrature (AGHQ). The HLM involved random effects as predictors, reducing bias due to measurement error. The joint model $h (Y^{*} | u) g (u)$ induced by the HLM is guaranteed to be compatible with the HLM. We suggested general rules for selecting the PKREs in a way that minimized the dimension of AGHQ. Although useful for the HLMs considered in this article, they are yet to be extended to other models, for example, for discrete outcomes. We hope that our work will spur research on estimation via PKREs.

The nonlinearities of multiple covariates, multiple outcomes and/or the presence of partially observed discrete variables will increase the dimension of PKREs and, thus, the expense of numerical integration by AGHQ. In that case, integration via multivariate Laplace approximation may contribute to efficient computation (Pinheiro and Bates Citation1995; Raudenbush, Yang, and Yosef Citation2000).

Further research may address the problem of highly correlated random effects at the cluster level. One strategy would introduce shared random effects to cope with the “curse of dimensionality” by AGHQ as well as the multicollinearity (Miyazaki and Frank Citation2006; Sun et al. Citation2023). In addition, parallel computation of numerical integrals for groups of or single clusters will reduce per-iteration computation time while application of the parameter-extended EM algorithm (Liu, Rubin, and Wu Citation1998) may reduce the number of iterations to converge.

In related research, Rockwood (Citation2020) estimated a multilevel structural equations model by ML, integrating linear random effects conditional on nonlinear random effects analytically and, then, nonlinear effects numerically by Gaussian quadrature. Our analysis encountered both outcome and predictors quite severely missing. To simulate the analysis closely, we simulated a MAR mechanism due to a known auxiliary predictor. We leave the important extension to a MAR (Grund, Lüdtke, and Robitzsch Citation2021) or other mechanism due to the fully observed or missing values of the outcome to near future.

It is possible to extend and automate our program that enables a user to specify an analytic HLM and determines PKREs based on a set of rules given the HLM. To that end, we need to develop a more general set of rules, for example, involving discrete covariates MAR.

Often, MI of a binary predictor MAR under multivariate normality is efficiently analyzed (Schafer Citation1997; Grund, Lüdtke, and Robitzsch Citation2018). The MI will be, however, incompatible with a HLM having the nonlinear effect of the predictor and, thus, unable to always guarantee unbiased estimation of the HLM. We are currently extending our ML approach via the PKRE idea that will ensure compatibility with and, thus, produce unbiased estimation of a HLM having the nonlinear effects of categorical predictors.

Extension of our approach to MI via Bayesian methods may increase the robustness of findings and is straightforward. In particular, our MN joint model $h (Y^{*}, ν | u; θ) = h (Y_{mis}, Y, ν | u; θ)$ given the PKRE u implies estimation of θ by the Gibbs sampler. The sampler will impute $(Y_{mis}, ν, u, θ)$ from their posteriors compatible with the joint model $h (Y^{*}, ν | u, θ) g (u | θ) p (θ)$ for a reasonably assumed prior $p (θ)$ by drawing: (i) $Y_{mis}$ and ν from MN $h (Y_{mis}, ν | Y, u, θ)$ ; (ii) u from nonstandard $g (u | Y^{*}, ν, θ) = h (Y^{*}, ν | u, θ) g (u | θ) / h (Y^{*}, ν | θ)$ , for example, by importance sampling via Markov chain Monte Carlo integration of $h (Y^{*}, ν | θ) = E [h (Y^{*}, ν | u, θ)]$ that samples u from a normal prior $g (u | θ)$ ; and (iii) θ from a standard posterior $p (θ | Y^{*}, u, ν)$ (Schafer and Yucel Citation2002). A potential virtue of a PKRE is to minimize the dimension of sampling the PKRE from a nonstandard posterior by importance sampling. We find it important to solve the measurement error problem by including level-2 random effects as latent covariates. To that end, we also explain how the Gibbs sampler without consideration of a PKRE (Goldstein et al. Citation2014; Enders, Du, and Keller Citation2020) may be modified to be compatible with our scientific model conditional on a latent covariate ν in Appendix E. A valuable future study is to compare the proposed Gibbs sampler estimators with existing estimators of a more sophisticated HLM, for example, involving multiple nonlinear effects or outcomes.

Supplementary Materials

The supplementary files that contain R and C executable codes, data sets and initial values to reproduce and may be downloaded. See README_supplementarymaterials.txt for instruction.

Supplemental material

README_supplementarymaterials.txt

Download Text (5.3 KB)

ShinRaudenbushComputation.zip

Download Zip (1.9 MB)

Acknowledgments

We thank two anonymous reviewers and an associate editor for their helpful comments, Craig Enders and Brian Keller for providing Blimp simulation codes, and Dongho Shin for helping Blimp simulation in R environment.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D210022. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

References

Arnold, B. C., and Press, S. J. (1989), “Compatible Conditional Distributions,” Journal of the American Statistical Association, 84, 152–156. DOI: 10.1080/01621459.1989.10478750.
Web of Science ®Google Scholar
Bartlett, J. W., Seaman, S. R., White, I. R., and Carpenter, J. R. (2015), “Multiple Imputation of Covariates by Fully Conditional Specification: Accommodating the Substantive Model,” Statistical Methods in Medical Research, 24, 462–487. DOI: 10.1177/0962280214521348.
PubMed Web of Science ®Google Scholar
Bates, D., Maechler, M., Bolker, B., and Walker, S. (2015), “Fitting Linear Mixed-Effects Models Using lme4,” Journal of Statistical Software, 67, 1–48. DOI: 10.18637/jss.v067.i01.
Web of Science ®Google Scholar
Carlin, B.P. and Louis, T.A. (2009), Bayesian Methods for Data Analysis. 3rd ed. Boca Raton, FL: CRC Press.
Google Scholar
Collins, L. M., Schafer, J. L., and Kam, C. (2003), “A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures,” Psychological Methods, 6, 330–351. DOI: 10.1037/1082-989X.6.4.330.
Web of Science ®Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 76, 1–38. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
Google Scholar
Dempster, A. P., Rubin, D. B., and Tsutakawa, R. K. (1981), “Estimation in Covariance Components Models,” Journal of the American Statistical Association, 76, 341–353. DOI: 10.1080/01621459.1981.10477653.
Web of Science ®Google Scholar
Enders, C. K., Mistler, S. A., and Keller, B. T. (2016), “Multilevel Multiple Imputation: A Review and Evaluation of Joint Modeling and Chained Equations Imputation,” Psychological Methods, 21, 222–240. DOI: 10.1037/met0000063.
PubMed Web of Science ®Google Scholar
Enders, C. K., Du, H., and Keller, B. T. (2020), “A Model-based Imputation Procedure for Multilevel Regression Models with Random Coefficients, Interaction Effects, and Nonlinear Terms,” Psychological Methods, 25, 88–112. DOI: 10.1037/met0000228.
PubMed Web of Science ®Google Scholar
Erler, N. S., Rizopoulos, D., Rosmalen, J., Jaddoe, V. W., Franco, O. H., and Lesaffre, E. M. (2016), “Dealing with Missing Covariates in Epidemiologic Studies: A Comparison between Multiple Imputation and a Full Bayesian Approach,” Statistics in Medicine, 35, 2955–2974. DOI: 10.1002/sim.6944.
PubMed Web of Science ®Google Scholar
Erler, N. S., Rizopoulos, D., Jaddoe, V. W., Franco, O. H., and Lesaffre, E. M. (2019), “Bayesian Imputation of Time-Varying Covariates in Linear Mixed Models,” Statistical Methods in Medical Research, 28, 555–568. DOI: 10.1177/0962280217730851.
PubMed Web of Science ®Google Scholar
Firebaugh, G. (1978), “A Rule for Inferring Individual-Level Relationships from Aggregate Data,” American Sociological Review, 43, 557–572. DOI: 10.2307/2094779.
Web of Science ®Google Scholar
Gelman, A. and Rubin, D. B. (1992). “Inference from iterative simulation using multiple sequences,” Statistical Science, 7, 457–472. DOI: 10.1214/ss/1177011136.
Google Scholar
Goldstein, H., Carpenter, J., Kenward, M., and Levin, K. (2009), “Multilevel Models with Multivariate Mixed Response Types,” Statistical Modellng, 9, 173–197. DOI: 10.1177/1471082X0800900301.
Web of Science ®Google Scholar
Goldstein, H., Carpenter, J. R., and Browne, W. J. (2014), “Fitting Multilevel Multivariate Models with Missing Data in Responses and Covariates that may Include Interactions and Non-linear Terms,” Journal of the Royal Statistical Society, Series A, 177, 553–564. DOI: 10.1111/rssa.12022.
Web of Science ®Google Scholar
Grund, S., Lüdtke, O., and Robitzsch, A. (2018), “Multiple Imputation of Missing Data for Multilevel Models: Simulations and Recommendations,” Organizational Research Methods, 21, 111–149. DOI: 10.1177/1094428117703686.
Web of Science ®Google Scholar
Grund, S., Lüdtke, O., and Robitzsch, A. (2021), “Multiple Imputation of Missing Data in Multilevel Models with the R Package mdmb: A Flexible Sequential Modeling Approach,” Behavior Research Methods, 53, 2631–2649.
PubMed Web of Science ®Google Scholar
Hedeker, D., and Gibbons, R. D. (1994), “A Random-Effects Ordinal Regression Model for Multilevel Analysis,” Biometrics, 50, 933–944.
PubMed Web of Science ®Google Scholar
Ibrahim, J. G., Chen, M. H., and Lipsitz, S. R. (2002), “Bayesian Methods for Generalized Linear Models with Covariates Missing at Random,” Canadian Journal of Statistics/Revue Canadienne De Statistique, 30, 55–78. DOI: 10.2307/3315865.
Web of Science ®Google Scholar
Keller, B. T., and Enders, C. K. (2021), Blimp user’s guide (Version 3). Available at www.appliedmissingdata.com/multilevel-imputation.html
Google Scholar
Kim, S., Sugar, C. A., and Belin, T. R. (2015). “Evaluating model-based imputation methods for missing covariates in regression models with interactions,” Statistics in Medicine, 34, 1876–1888. DOI: 10.1002/sim.6435.
PubMed Web of Science ®Google Scholar
Lee, V. E., and Bryk, A. S. (1989), “A Multilevel Model of the Social Distribution of High School Achievement,” Sociology of Education, 62, 172–192. DOI: 10.2307/2112866.
Web of Science ®Google Scholar
Lindley, D. V., and Smith, A. F. M. (1972), “Bayes Estimates for the Linear Model,” Journal of the Royal Statistical Society, Series B, 34, 1–41. DOI: 10.1111/j.2517-6161.1972.tb00885.x.
Google Scholar
Little, R. J. A., and Rubin, D. B. (2002), Statistical Analysis with Missing Data, New York: Wiley.
Google Scholar
Liu, C., Rubin, B. D., and Wu, Y. (1998), “Parameter Expansion to Accelerate EM: The PX-EM Algorithm,” Biometrika, 85, 755–770. DOI: 10.1093/biomet/85.4.755.
Web of Science ®Google Scholar
Liu, M., Taylor, J. M. G., and Belin, T. R. (2000), “Multiple Imputation and Posterior Simulation for Multivariate Missing Data in Longitudinal Studies,” Biometrics, 56, 1157–1163. DOI: 10.1111/j.0006-341x.2000.01157.x.
PubMed Web of Science ®Google Scholar
Liu, J., Gelman, A., Hill, J., Su, Y., and Kropko, J. (2014), “On the Stationary Distribution of Iterative Imputations,” Biometrika, 101, 155–173. DOI: 10.1093/biomet/ast044.
Web of Science ®Google Scholar
Miyazaki, Y. and Frank, K. A. (2006). “A hierarchical linear model with factor analysis structure at level 2,” Journal of Educational and Behavioral Statistics, 31, 125–156. DOI: 10.3102/10769986031002125.
Web of Science ®Google Scholar
Naylor, J. C., and Smith, A. F. M. (1982), “Applications of a Method for the Efficient Computation of Posterior Distributions,” Applied Statistics, 31, 214–225. DOI: 10.2307/2347995.
Web of Science ®Google Scholar
Nomi, T., and Raudenbush, S. W. (2016), “Making a Success of “Algebra for All:” the Impact of Extended Instructional Time and Classroom Peer Skill in Chicago,” Educational Evaluation and Policy Analysis, 38, 431–451. DOI: 10.3102/0162373716643756.
Web of Science ®Google Scholar
Olsen, M. K., and Schafer, J. L. (2001), “A Two-Part Random-Effects Model for Semicontinuous Longitudinal Data,” Journal of the American Statistical Association, 96, 730–745. DOI: 10.1198/016214501753168389.
Web of Science ®Google Scholar
Piketty, T. (2014), Capital in the 21st Century, Cambridge, MA: Harvard University Press.
Google Scholar
Pinheiro, J. C., and Bates, D. M. (1995), “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model,” Journal of Computational and Graphical Statistics, 4, 12–35. DOI: 10.2307/1390625.
Google Scholar
R Core Team (2017), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Google Scholar
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2002), “Reliable Estimation of Generalized Linear Mixed Models using Adaptive Quadrature,” The Stata Journal, 2, 1–21. DOI: 10.1177/1536867X0200200101.
Web of Science ®Google Scholar
Raghunathan, T., Lepkowski, J., Van Hoewyk, J., and Solenberger, P. (2001), “A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,” Survey Methodology, 27, 85–95.
Google Scholar
Raudenbush, S. W., and Bryk A. S. (1986), “Hierarchical Model for Studying School Effects,” Sociology of Education, 59, 1–17. DOI: 10.2307/2112482.
Web of Science ®Google Scholar
Raudenbush, S. W., Yang, M., Yosef, M. (2000), “Maximum Likelihood for Generalized Linear Models with Nested Random Effects via High-Order, Multivariate Laplace Approximation,” Journal of Computational and Graphical Statistics, 9, 141–157. DOI: 10.2307/1390617.
Web of Science ®Google Scholar
Raudenbush, S. W., and Bryk, A. S. (2002), Hierarchical Linear Models, Newbury Park, CA: Sage.
Google Scholar
Ren, C., and Shin, Y. (2016), “Longitudinal Latent Variable Models Given Incompletely Observed Biomarkers and Covariates,” Statistics in Medicine, 35, 4729–4745. DOI: 10.1002/sim.7022.
PubMed Web of Science ®Google Scholar
Rivera-Batiz, F. L. (1982), “International Migration, Non-traded Goods and Economic Welfare in the Source Country,” Journal of Development Economics, 11, 81–90. DOI: 10.1016/0304-3878(82)90043-8.
PubMed Web of Science ®Google Scholar
Rockwood, N. J. (2020), “Maximum Likelihood Estimation of Multilevel Structural Equation Models with Random Slopes for Latent Covariates,” Psychometrika, 85, 275–300 DOI: 10.1007/s11336-020-09702-9.
PubMed Web of Science ®Google Scholar
Rubin, D. B. (1976), “Inference and Missing Data,” Biometrika, 63, 581–592. DOI: 10.1093/biomet/63.3.581.
Web of Science ®Google Scholar
Rubin, D. B. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.
Google Scholar
Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman & Hall.
Google Scholar
Schafer, J. L., and Yucel, R. M. (2002), “Computational Strategies for Multivariate Linear Mixed-Effects Models with Missing Values,” Journal of Computational and Graphical Statistics, 11, 437–457. DOI: 10.1198/106186002760180608.
Web of Science ®Google Scholar
Shin, Y., and Raudenbush, S. W. (2007), “Just-Identified Versus Over-Identified Two-Level Hierarchical Linear Models with Missing Data,” Biometrics, 63, 1262–1268. DOI: 10.1111/j.1541-0420.2007.00818.x.
PubMed Web of Science ®Google Scholar
Shin, Y., and Raudenbush, S. W. (2010), “A Latent Cluster Mean Approach to The Contextual Effects Model with Missing Data,” Journal of Educational and Behavioral Statistics, 35, 26–53.
Web of Science ®Google Scholar
Shin, Y. and Raudenbush, SW. (2013), “Efficient Analysis of Q-Level Nested Hierarchical General Linear Models Given Ignorable Missing Data,” The International Journal of Biostatistics, 9(1), 109–133. DOI: 10.1515/ijb-2012-0048.
Google Scholar
Stram, D. O., and Lee, J. (1994), “Variance Components Testing in the Longitudinal Mixed Effects Model,” Biometrics, 50, 1171–1177. DOI: 10.2307/2533455.
PubMed Web of Science ®Google Scholar
Sun, X., Shin, Y., Lafata, J. E., and Raudenbush, S. W. (2023), “Variability in Causal Effects and Noncompliance in a Multisite Trial: Estimation of a Bivariate Hierarchical Generalized Linear Model,” submitted.
Google Scholar
Tourangeau, K., Nord, C.L.ê, T., Sorongon, A. G., and Najarian, M. (2009). Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K), Combined User’s Manual for the ECLS-K Eighth-Grade and K-8 Full Sample Data Files and Electronic Codebooks (NCES 2009-004). National Center for Education Statistics, Institute of Education Sciences, US Department of Education. Washington, D.C.
Google Scholar
van Buuren, S., Brand, J., Groothuis-Oudshoorn, C., and Rubin, D. (2006), “Fully Conditional Specification in Multivariate Imputation,” Journal of Statistical Computation and Simulation, 76, 1049–1064. DOI: 10.1080/10629360600810434.
Web of Science ®Google Scholar
Willms, J. D. (1986), “Social Class Segregation and its Relationship to Pupils’ Examination Results in Scotland,” American Sociological Review, 51, 224–241. DOI: 10.2307/2095518.
Web of Science ®Google Scholar

Appendix A

Problem of Bias in Estimating Model (19)

Let

R_{ij}^{*} = R_{ij}

and

C_{ij}^{*} = C_{ij}

fully observed, δ = 0 to simplify notation,

β_{kj} = γ_{k 0} + u_{kj}^{*} \sim N (γ_{k 0}, τ_{kk})

and

cov (u_{0 j}^{*}, u_{1 j}^{*}) = τ_{01}

for k = 0, 1 in model (19). Because

{\bar{C}}_{\cdot j}

and

β_{j} = (β_{0 j}, β_{1 j}, ν_{j})

are independent of

C_{ij} - {\bar{C}}_{\cdot j}

, and

ϵ_{ij}^{*} | C_{ij} - {\bar{C}}_{\cdot j} \sim N (C_{ij} - {\bar{C}}_{\cdot j}, σ_{cc} / n_{j})

[\begin{matrix} R_{ij} \\ {\bar{C}}_{\cdot j} \end{matrix}] | β_{j}, C_{ij} - {\bar{C}}_{\cdot j} \sim N ([\begin{matrix} β_{0 j} + β_{1 j} (C_{ij} - {\bar{C}}_{\cdot j}) \\ ν_{j} \end{matrix}], [\begin{matrix} β_{1 j}^{2} σ_{cc} / n_{j} + σ^{2} & β_{1 j} σ_{cc} / n_{j} \\ β_{1 j} σ_{cc} / n_{j} & σ_{cc} / n_{j} \end{matrix}])

implying a mixed model

R_{ij} | C_{ij} - {\bar{C}}_{\cdot j} \sim N [γ_{00} + γ_{10} (C_{ij} - {\bar{C}}_{\cdot j}), var (R_{ij} | C_{ij} - {\bar{C}}_{\cdot j})]

for

var (R_{ij} | C_{ij} - {\bar{C}}_{\cdot j}) = τ_{00} + 2 τ_{01} (C_{ij} - {\bar{C}}_{\cdot j}) + τ_{11} {(C_{ij} - {\bar{C}}_{\cdot j})}^{2} + (τ_{11} + γ_{10}^{2}) σ_{cc} / n_{j} + σ^{2}

cov (R_{ij}, {\bar{C}}_{\cdot j} | C_{ij} - {\bar{C}}_{\cdot j}) = γ_{01} τ_{ν ν} + γ_{11} τ_{ν ν} (C_{ij} - {\bar{C}}_{\cdot j}) + γ_{10} σ_{cc} / n_{j}

. Let

λ_{j} = τ_{ν ν} / (τ_{ν ν} + σ_{cc} / n_{j})

be the reliability of

{\bar{C}}_{\cdot j}

as an error-prone measure of ν_j (Raudenbush and Bryk Citation2002). The implied

R_{ij} | C_{ij} - {\bar{C}}_{\cdot j}, {\bar{C}}_{\cdot j} \sim N (μ_{ij}, V_{ij})

has

(A1)

\begin{matrix} μ_{ij} = γ_{00} + [γ_{01} - (1 - λ_{j}) (γ_{01} - γ_{10})] {\bar{C}}_{\cdot j} + γ_{10} (C_{ij} - {\bar{C}}_{\cdot j}) \\ + λ_{j} γ_{11} {\bar{C}}_{\cdot j} (C_{ij} - {\bar{C}}_{\cdot j}) \\ V_{ij} = σ^{2} + [τ_{00 | ν} + (1 - λ_{j}) {(γ_{01} - γ_{10})}^{2} τ_{ν ν} + (τ_{11 | ν} + γ_{11}^{2} τ_{ν ν}) σ_{cc} / n_{j}] \\ + 2 [τ_{01 | ν} + (1 - λ_{j}) (γ_{01} - γ_{10}) γ_{11} τ_{ν ν}] (C_{ij} - {\bar{C}}_{\cdot j}) \\ + [τ_{11 | ν} + (1 - λ_{j}) γ_{11}^{2} τ_{ν ν}] {(C_{ij} - {\bar{C}}_{\cdot j})}^{2} . \end{matrix}

(A1)

The bias terms are complicated functions of cluster sizes n_j and parameters, but revealing in the balanced case of n_j = n where $λ_{j} = λ$ . The interaction effect $λ γ_{11}$ of ${\bar{C}}_{\cdot j} (C_{ij} - {\bar{C}}_{\cdot j})$ has a downward bias term $- (1 - λ) γ_{11}$ that introduces bias $(1 - λ) (γ_{01} - γ_{10}) γ_{11} τ_{ν ν}$ and $(1 - λ) γ_{11}^{2} τ_{ν ν}$ in estimation of $τ_{01 | ν}$ and $τ_{11 | ν}$ , respectively. Likewise, the main effect of ${\bar{C}}_{\cdot j}$ has a bias term $- (1 - λ) (γ_{01} - γ_{10})$ which propagates bias $(1 - λ_{j}) {(γ_{01} - γ_{10})}^{2} τ_{ν ν}$ and $(1 - λ) (γ_{01} - γ_{10}) γ_{11} τ_{ν ν}$ in estimation of $τ_{00 | ν}$ and $τ_{01 | ν}$ , respectively. Estimation of $τ_{00 | ν}$ results in an additional upward bias term $(τ_{11 | ν} + γ_{11}^{2} τ_{ν ν}) σ_{cc} / n$ from the error-prone measure ${\bar{C}}_{\cdot j}$ of ν_j. Consequently, this approach results in biased estimation of $(γ_{01}, γ_{11}, τ_{00 | ν}, τ_{01 | ν}, τ_{11 | ν})$ . In particular, the estimate of γ₁₁ is biased downward, but those of $τ_{00 | ν}$ and $τ_{11 | ν}$ upward.

Two special cases are of interest. When $γ_{11} = 0, cov (β_{0 j}, β_{1 j} | {\bar{C}}_{\cdot j}) = τ_{01 | ν}$ and $var (β_{1 j} | {\bar{C}}_{\cdot j}) = τ_{11 | ν}$ become unbiased, and the estimator of $τ_{00 | ν}$ becomes less biased. As cluster sizes $n_{j} \to \infty, λ_{j} \to 1, {\bar{C}}_{\cdot j} \to ν_{j}$ by the laws of large numbers, and all bias terms tend to zero.

Appendix B

The E Step for estimation of model (14)

Let $A_{1 ij}^{*} = {[C_{ij}^{*} A_{1 ij}^{* T}]}^{T}, Y_{ij}^{*} = {[R_{ij}^{*} A_{1 ij}^{* T}]}^{T} p_{1}$ -by-1, $A_{2 j}^{*} p_{2}$ -by-1, and $ν_{1 j} = {[ν_{j} ν_{1 j}^{T}]}^{T}$ . A reasonably general CD model given known covariates $X_{1 ij}$ at level 1 and $X_{2 j}$ at level 2 is (B1) $\begin{matrix} R_{ij}^{*} = γ_{00}^{T} X_{2 j} + u_{0 j}^{*} + B_{1 j}^{T} (A_{1 ij}^{*} - Δ_{00} X_{2 j} - ν_{1 j}) + γ_{30}^{T} X_{1 ij} + e_{ij}^{*} \\ A_{1 ij}^{*} = Δ_{00} X_{2 j} + Δ_{10} X_{1 ij} + ν_{1 j} + ϵ_{ij}^{*}, A_{2 j}^{*} = Δ_{2} X_{2 j} + ν_{2 j} \end{matrix}$ (B1) for $B_{1 j}^{T} = [γ_{10}^{T} X_{2 j} + u_{1 j}^{*} γ_{20}^{T}], Δ_{00} = {[δ_{001} Δ_{002}^{T}]}^{T}$ and $Δ_{10} = {[δ_{101} Δ_{102}^{T}]}^{T}$ . Denote $ν_{0 j} = {[u_{0 j}^{*} ν_{1 j}^{T}]}^{T}$ to separate all other random effects $ν_{j}^{*} = {[ν_{0 j}^{T} ν_{2 j}^{T}]}^{T}$ from $u_{1 j}^{*}$ at level 2, and let $var (ν_{j}^{*}) = [\begin{matrix} T_{00} & T_{02} \\ T_{20} & T_{22} \end{matrix}], cov (ν_{j}^{*}, u_{1 j}^{*}) = [\begin{matrix} T_{01} \\ T_{21} \end{matrix}]$ and $var (ν_{j}^{*} | u_{1 j}^{*}) = [\begin{matrix} T_{00 | 1} & T_{02 | 1} \\ T_{20 | 1} & T_{22 | 1} \end{matrix}]$ . Define matrix $O_{2 j}$ that selects observed values $A_{2 j} = O_{2 j} A_{2 j}^{*}$ in $A_{2 j}^{*}$ such that $var (A_{2 j} | u_{1 j}^{*}) = O_{2 j} T_{22 | 1} O_{2 j}^{T} = T_{22 | 1 j}$ , and $cov (ν_{aj}, ν_{bj} | u_{1 j}^{*}, A_{2 j}) = Ω_{abj} = T_{ab | 1} - T_{a 2 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} O_{2 j} T_{2 b | 1}$ for $a, b = 0, 2$ . The likelihood $L (θ) = \prod_{j} \int h (Y_{j} | u_{1 j}^{*}) ϕ (u_{1 j}^{*} | 0, τ_{11}) d u_{1 j}^{*}$ has a key component (B2) $\begin{matrix} h (Y_{j} | u_{1 j}^{*}) \propto {(| Ω_{00 j} |^{- 1} | Δ_{j} |^{- 1} | T_{22 | 1 j} |^{- 1} \prod_{i} | Σ_{ij} |^{- 1})}^{1 / 2} exp {- \frac{1}{2} \\ [\sum_{i} e_{o 1 ij}^{T} Σ_{ij}^{- 1} e_{o 1 ij} - \sum_{i} e_{o 1 ij}^{T} Σ_{ij}^{- 1} O_{ij} Δ_{j}^{- 1} (\sum_{i} O_{ij}^{T} Σ_{ij}^{- 1} e_{o 1 ij} \\ + 2 Ω_{00 j}^{- 1} T_{02 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} e_{o 2 j}) + e_{o 2 j}^{T} (T_{22 | 1 j}^{- 1} O_{2 j} T_{20 | 1} \\ (Ω_{00 j}^{- 1} - Ω_{00 j}^{- 1} Δ_{j}^{- 1} Ω_{00 j}^{- 1}) T_{02 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} + T_{22 | 1 j}^{- 1}) e_{o 2 j}]} \end{matrix}$ (B2) for $e_{o 1 ij} = O_{ij} (d_{ij}^{*} - T_{01} τ_{11}^{- 1} u_{1 j}^{*}), e_{o 2 j} = O_{2 j} (A_{2 j}^{*} - Δ_{2} X_{2 j} - T_{21} τ_{11}^{- 1} u_{1 j}^{*})$ and $Δ_{j} = \sum_{i = 1}^{n_{j}} A_{OOij} + Ω_{00 j}^{- 1}$ where $A_{OOij} = O_{ij}^{T} Σ_{ij}^{- 1} O_{ij}$ and $d_{ij}^{*} = [\begin{matrix} R_{ij}^{*} \\ A_{1 ij} \end{matrix}] - [\begin{matrix} γ_{00}^{T} \\ Δ_{00} \end{matrix}] X_{2 j} - [\begin{matrix} 1 & B_{1 j}^{T} \\ 0 & I_{p_{1} - 1} \end{matrix}] [\begin{matrix} γ_{30}^{T} \\ Δ_{10} \end{matrix}] X_{1 ij}$ .

Define $E (A) = E (A | u_{1 j}^{*}, Y_{j}), V (A) = var (A | u_{1 j}^{*}, Y_{j})$ and $C (A, B) = cov (A, B | u_{1 j}^{*}, Y_{j})$ . We have multivariate normal $f (ν_{0 j}, ν_{2 j}, r_{ij}^{*} | u_{1 j}^{*}, Y_{j})$ for $\begin{matrix} E (ν_{0 j}) = T_{01} τ_{11}^{- 1} u_{1 j}^{*} + Δ_{j}^{- 1} (\sum_{i} O_{ij}^{T} Σ_{ij}^{- 1} e_{o 1 ij} \\ + Ω_{00 j}^{- 1} T_{02 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} e_{o 2 j}), \\ E (ν_{2 j}) = T_{21} τ_{11}^{- 1} u_{1 j}^{*} + T_{22 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} e_{o 2 j} \\ + Ω_{20 j} Ω_{00 j}^{- 1} [E (ν_{0 j}) - T_{01} τ_{11}^{- 1} u_{1 j}^{*} - T_{02 | 1} O_{2 j}^{T} T_{22 | 1 j}^{- 1} e_{o 2 j}], \\ E (r_{ij}^{*}) = Σ_{j}^{*} A_{OOij} [d_{ij}^{*} - E (ν_{0 j})], \\ V (r_{ij}^{*}) = Σ_{j}^{*} - Σ_{j}^{*} (A_{OOij} - A_{OOij} Δ_{j}^{- 1} A_{OOij}) Σ_{j}^{*}, \\ V (ν_{0 j}) = Δ_{j}^{- 1}, \\ C (ν_{0 j}, ν_{2 j}) = Δ_{j}^{- 1} Ω_{00 j}^{- 1} Ω_{02 j}, \\ C (ν_{0 j}, r_{ij}^{*}) = - Δ_{j}^{- 1} A_{OOij} Σ_{j}^{*}, \\ V (ν_{2 j}) = Ω_{22 j} - Ω_{20 j} (Ω_{00 j}^{- 1} - Ω_{00 j}^{- 1} Δ_{j}^{- 1} Ω_{00 j}^{- 1}) Ω_{02 j}, \\ C (ν_{2 j}, r_{ij}^{*}) = Ω_{20 j} Ω_{00 j}^{- 1} C (ν_{0 j}, r_{ij}^{*}) . \end{matrix}$

Let ${\tilde{A}}_{1 ij}^{*} = A_{1 ij}^{*} - Δ_{002} X_{2 j} - ν_{1 j}, δ_{10} = vec (Δ_{10}^{T})$ and $β_{j} = (I_{p_{1} + 1 + p_{2}} \otimes X_{2 j}^{T}) γ_{β} + u_{j}^{*}$ for $γ_{β} = vec [γ_{00} γ_{10} Δ_{00}^{T} Δ_{2}^{T}]$ and $u_{j}^{*} = {[u_{0 j}^{*} u_{1 j}^{*} ν_{j}^{T}]}^{T}$ . The expected CD MLEs are $\begin{matrix} {\hat{γ}}_{20} = γ_{20} + {(\sum_{j} E [\sum_{i} E ({\tilde{A}}_{1 ij}^{*} {\tilde{A}}_{1 ij}^{* T}) | Y_{j}])}^{- 1} \sum_{j} E [\sum_{i} E (e_{ij}^{*} {\tilde{A}}_{1 ij}^{*}) | Y_{j}], \\ {\hat{γ}}_{30} = γ_{30} + {(\sum_{j} \sum_{i} X_{1 ij} X_{1 ij}^{T})}^{- 1} \sum_{j} \sum_{i} X_{1 ij} E [E (e_{ij}^{*}) | Y_{j}], \\ {\hat{δ}}_{10} = δ_{10} + vec [(\sum_{j} \sum_{i} E [E (ϵ_{ij}^{*}) | Y_{j}] X_{1 ij}^{T}) {(\sum_{j} \sum_{i} X_{1 ij} X_{1 ij}^{T})}^{- 1}] \\ {\hat{γ}}_{β} = γ_{β} + vec [(\sum_{j} E [E (u_{j}^{*}) | Y_{j}] X_{2 j}^{T}) {(\sum_{j} X_{2 j} X_{2 j}^{T})}^{- 1}] \\ {\hat{σ}}^{2} = \sum_{j} E [\sum_{i} E (e_{ij}^{* 2}) | Y_{j}] / N, {\hat{Σ}}_{ϵ} = \sum_{j} E [\sum_{i} E (ϵ_{ij}^{*} ϵ_{ij}^{* T}) | Y_{j}] / N, \\ \hat{τ} = \sum_{j} E [E (u_{j}^{*} u_{j}^{* T}) | Y_{j}] / J \end{matrix}$ given θ where $E (e_{ij}^{*}) = B_{1 j}^{* T} E (r_{ij}^{*})$ and $E (e_{ij}^{* 2}) = B_{1 j}^{* T} E (r_{ij}^{*} r_{ij}^{* T}) B_{1 j}^{*}$ for $B_{1 j}^{* T} = [1 - B_{1 j}^{T}]$ .

Appendix C

Numerical Integration by AGHQ

Let $f (u_{1 j}^{*}) = h (Y_{j} | u_{1 j}^{*}) ϕ (u_{1 j}^{*}; 0, τ_{11})$ be a function of $u_{1 j}^{*}$ . Given ${\tilde{u}}_{1 j}^{*} = E (u_{1 j}^{*} | Y_{j}), V_{u 1 j} = var (u_{1 j}^{*} | Y_{j}) = L_{u 1 j}^{2} / 2$ , Q-point weights $(w_{1}, \dots, w_{Q})$ and abscissas $(a_{1}, \dots, a_{Q})$ , (C1) $\begin{matrix} h (Y_{j}) = \int \frac{ϕ (u_{1 j}^{*}; {\tilde{u}}_{1 j}^{*}, V_{u 1 j})}{ϕ (u_{1 j}^{*}; {\tilde{u}}_{1 j}^{*}, V_{u 1 j})} f (u_{1 j}^{*}) d u_{1 j}^{*} \\ \approx L_{u 1 j} \sum_{k = 1}^{Q} w_{k} e^{a_{k}^{2}} f (z_{kj}), \end{matrix}$ (C1) for $z_{kj} = L_{u 1 j} a_{k} + {\tilde{u}}_{1 j}^{*}$ . The $g (u_{1 j}^{*} | Y_{j}) = f (u_{1 j}^{*}) / h (Y_{j})$ is approximately $ϕ (u_{1 j}^{*}; {\tilde{u}}_{1 j}^{*}, V_{u 1 j})$ for large cluster sizes n_j by the Bayesian central limit theorem such that $f (u_{1 j}^{*}) \propto ϕ (u_{1 j}^{*}; {\tilde{u}}_{1 j}^{*}, V_{u 1 j})$ produces well approximated $h (Y_{j})$ by a low degree polynomial. The approximation is exact if $f (u_{1 j}^{*})$ is a $2 Q - 1$ degree polynomial in $u_{1 j}^{*}$ (Pinheiro and Bates Citation1995; Rabe-Hesketh, Skrondal, and Pickles Citation2002; Carlin and Louis Citation2009). Likewise, for $E (S_{CD j} | u_{1 j}^{*}, Y_{j})$ closed-form, (C2) $\begin{matrix} E (S_{CD j} | Y_{j}) = \int E (S_{CD j} | u_{1 j}^{*}, Y_{j}) g (u_{1 j}^{*} | Y_{j}) d u_{1 j}^{*} \\ \approx \frac{L_{u 1 j}}{h (Y_{j})} \sum_{k = 1}^{Q} E (S_{CD j} | z_{kj}, Y_{j}) w_{k} e^{a_{k}^{2}} f (z_{kj}) \end{matrix}$ (C2)

Let $Y_{j}^{*} = (Y_{j}, Y_{mis j})$ , and $ϕ_{τ}$ and $ϕ_{Σ}$ be vectors of distinct elements of τ and $Σ_{ϵ}$ , respectively. The log-likelihood $l = \sum_{j} l_{j}$ and score $S = \sum_{j} S_{j}$ have summands $\begin{matrix} l_{j} = ln h (Y_{j}) = ln \int g_{j} d Y_{mis j} d ν_{0 j} d u_{1 j}^{*}, \\ S_{j} = \frac{\partial l_{j}}{\partial θ} = E [E (\frac{\partial ln g_{j}}{\partial θ}) | Y_{j}] . \end{matrix}$

for $g_{j} = \prod_{i} f (R_{ij}^{*} | A_{1 ij}^{*}, u_{j}^{*}; γ_{20}, γ_{30}, σ^{2}) f (A_{1 ij}^{*} | u_{j}^{*}; δ_{10}, ϕ_{Σ}) ϕ (u_{j}^{*}; γ_{β}, τ)$ from (B1). Let $E = \frac{\partial vec τ}{\partial ϕ_{τ}^{T}}$ and $F = \frac{\partial vec Σ_{ϵ}}{\partial ϕ_{Σ}^{T}}$ . The $E (\frac{\partial ln g_{j}}{\partial θ})$ stacks $\begin{matrix} E (\frac{\partial ln g_{j}}{\partial γ_{20}}) = σ^{- 2} \sum_{i} E (e_{ij}^{*} {\tilde{A}}_{1 ij}^{*}), \\ E (\frac{\partial ln g_{j}}{\partial γ_{30}}) = σ^{- 2} \sum_{i} E (e_{ij}^{*}) X_{1 ij}, \\ E (\frac{\partial ln g_{j}}{\partial δ_{10}}) = vec (X_{1 ij} \sum_{i} E (ϵ_{ij}^{* T}) Σ_{ϵ}^{- 1}), \\ E (\frac{\partial ln g_{j}}{\partial σ^{2}}) = \frac{1}{2} A_{σ j}, \\ E (\frac{\partial ln g_{j}}{\partial ϕ_{Σ}}) = \frac{1}{2} F^{T} A_{Σ j}, \\ E (\frac{\partial ln g_{j}}{\partial γ_{β}}) = vec (X_{2 j} E (u_{j}^{* T}) τ^{- 1}), \\ E (\frac{\partial ln g_{j}}{\partial ϕ_{τ}}) = \frac{1}{2} E^{T} vec [τ^{- 1} E (u_{j}^{*} u_{j}^{* T}) τ^{- 1} - τ^{- 1}] \end{matrix}$ for $A_{σ j} = σ^{- 4} \sum_{i} E (e_{ij}^{* 2}) - n_{j} σ^{- 2}$ and $A_{Σ j} = vec (Σ_{ϵ}^{- 1} \sum_{i} E (ϵ_{ij}^{*} ϵ_{ij}^{* T}) Σ_{ϵ}^{- 1} - n_{j} Σ_{ϵ}^{- 1})$ . We compute S_j also by AGHQ for $var (\hat{θ}) \approx {(\sum_{j} S_{j} S_{j}^{T})}^{- 1}$ (Hedeker and Gibbons Citation1994; Raudenbush, Yang, and Yosef Citation2000; Olsen and Schafer Citation2001). Section 7 shows good approximation for the sample sizes analyzed in this article.

Appendix D

Translating model (B1) to ${\hat{θ}}_{(19)}^{*}$

Define $β_{kj} = γ_{k 0}^{T} X_{2 j} + u_{kj}^{*}, β_{Cj} = δ_{001}^{T} X_{2 j} + ν_{j}, cov (u_{kj}^{*}, u_{k' j}^{*} | X_{2 j}) = τ_{kk'}, cov (u_{kj}^{*}, ν_{j} | X_{2 j}) = τ_{k ν}$ and $var (ν_{j} | X_{2 j}) = τ_{ν ν}$ for $k, k' = 0, 1$ . With $X_{2 j}$ marginalized out, $β_{j} = {[β_{0 j} β_{1 j} β_{Cj}]}^{T} \sim N ([\begin{matrix} γ_{00}^{T} \\ γ_{10}^{T} \\ δ_{121}^{T} \end{matrix}] E X_{2}, [\begin{matrix} t_{00} & t_{01} & t_{0 ν} \\ t_{11} & t_{1 ν} \\ t_{ν ν} \end{matrix}] = [\begin{matrix} τ_{00} & τ_{01} & τ_{0 ν} \\ τ_{11} & τ_{1 ν} \\ τ_{ν ν} \end{matrix}] + [\begin{matrix} γ_{00}^{T} \\ γ_{10}^{T} \\ δ_{001}^{T} \end{matrix}] V X_{2} [γ_{00} γ_{10} δ_{001}])$ for $E (X_{2 j}) = E X_{2}$ and $var (X_{2 j}) = V X_{2}$ . Let ${\tilde{β}}_{Cj} = β_{Cj} - δ_{001}^{T} E X_{2} \sim N (0, t_{ν ν})$ to find (D1) $β_{kj} | {\tilde{β}}_{Cj} \sim (γ_{k 0}^{*} + γ_{k 1}^{*} {\tilde{β}}_{Cj}, τ_{kk}^{*})$ (D1) for $γ_{k 0}^{*} = γ_{k 0}^{T} E X_{2}, γ_{k 1}^{*} = t_{k ν} / t_{ν ν}$ and $cov (β_{kj}, β_{k' j} | {\tilde{β}}_{Cj}) = τ_{kk'}^{*} = t_{kk'} - γ_{k 1}^{*} t_{ν ν} γ_{k' 11}^{*}$ .

Within cluster j given $u_{j}^{*}$ , denote ${\tilde{A}}_{1 ij}^{*} = A_{1 ij}^{*} - Δ_{00} X_{2 j} - ν_{1 j} = Δ_{10} X_{1 ij} + ϵ_{ij}$ . Marginalizing $X_{1 ij}$ out using $E X_{1} = E (X_{1 ij})$ and $V X_{1} = var (X_{1 ij})$ , we find $f (Y_{ij}^{*} | u_{j}^{*})$ in $\tilde{A_{1 ij}^{*}} = Δ_{10} E X_{1} + {\tilde{ϵ}}_{ij}^{*}, R_{ij} = β_{0 j} + (B_{1 j}^{T} Δ_{10} + γ_{30}^{T}) E X_{1} + B_{1 j}^{T} {\tilde{ϵ}}_{ij}^{*} + {\tilde{e}}_{ij}^{*}$ for $[\begin{matrix} {\tilde{e}}_{ij}^{*} \\ {\tilde{ϵ}}_{ij}^{*} \end{matrix}] \sim N (0, [\begin{matrix} σ_{ee}^{*} & Σ_{e ϵ}^{*} \\ Σ_{ϵ e}^{*} & Σ_{ϵ ϵ}^{*} \end{matrix}])$ where $σ_{ee}^{*} = γ_{30}^{T} V X_{1} γ_{30} + σ^{2}, Σ_{ϵ e}^{*} = [\begin{matrix} σ_{ce} \\ Σ_{1 e} \end{matrix}] = [\begin{matrix} δ_{101}^{T} \\ Δ_{102} \end{matrix}] V X_{1} γ_{30}$ and $Σ_{ϵ ϵ}^{*} = [\begin{matrix} σ_{cc}^{*} & Σ_{c 1}^{*} \\ Σ_{1 c}^{*} & Σ_{11}^{*} \end{matrix}] = [\begin{matrix} δ_{101}^{T} V X_{1} δ_{101} + σ_{cc} & δ_{101}^{T} V X_{1} Δ_{102}^{T} + Σ_{c 1} \\ Δ_{102} V X_{1} δ_{101} + Σ_{1 c} & Δ_{102} V X_{i} Δ_{102}^{T} + Σ_{11} \end{matrix}]$ . Consequently, $R_{ij} | β_{j}, ϵ_{ij} \sim N [E (R_{ij}^{*} | β_{j}, ϵ_{ij}), σ^{' 2}]$ for (D2) $\begin{matrix} E (R_{ij}^{*} | β_{j}, ϵ_{ij}) = β_{0 j} + (B_{1 j}^{T} Δ_{10} + γ_{30}^{T}) E X_{1} \\ + (β_{1 j} + γ_{20}^{T} Σ_{1 c}^{*} / σ_{cc}^{*} + σ_{ec}^{*} / σ_{cc}^{*}) ϵ_{ij}, \\ σ^{' 2} = γ_{20}^{T} (Σ_{11}^{*} - Σ_{1 c}^{*} Σ_{c 1}^{*} / σ_{cc}^{*}) γ_{20} + 2 γ_{20}^{T} (Σ_{1 e}^{*} - Σ_{1 c}^{*} σ_{ce}^{*} / σ_{cc}^{*}) \\ + (σ_{ee}^{*} - σ_{ce}^{* 2} / σ_{cc}^{*}) . \end{matrix}$ (D2)

As explained by Section 4.1, EquationEquations (D1)–(D2) result in a mixed model (19) for (D3) $\begin{matrix} γ_{00} = γ_{00}^{*} + (γ_{10}^{*} δ_{101}^{T} + γ_{20}^{T} Δ_{102} + γ_{30}^{T}) E X_{1}, \\ γ_{01} = γ_{01}^{*} + γ_{11}^{*} δ_{101}^{T} E X_{1}, \\ γ_{10} = γ_{10}^{*} + γ_{20}^{T} Σ_{1 c}^{*} / σ_{cc}^{*} + σ_{ec}^{*} / σ_{cc}^{*}, \\ γ_{11} = γ_{11}^{*}, \\ σ^{2} = σ^{' 2} \\ τ_{00 | ν} = τ_{00}^{*} + 2 τ_{01}^{*} δ_{101}^{T} E X_{1} + τ_{11}^{*} {(δ_{101}^{T} E X_{1})}^{2}, \\ τ_{01 | ν} = τ_{01}^{*} + τ_{11}^{*} δ_{101}^{T} E X_{1}, \\ τ_{11 | ν} = τ_{11}^{*} . \end{matrix}$ (D3)

Estimating $(E X_{1}, V X_{1}, E X_{2}, V X_{2})$ from sample, we find ${\hat{θ}}_{(19)}^{*}$ above and compute $var ({\hat{θ}}_{(19)}^{*})$ by the delta method. Translations (D3) simplify if $E X_{1} = 0$ (e.g., $E X_{1} = E (X_{1 ij} - {\bar{X}}_{1 j})$ ). In Section 6, we used the translations for $X_{2 j} = 1$ and $X_{1 ij} = 0$ .

Appendix E

Compatible Gibbs Sampler without a PKRE

Without the merit of a PKRE, a Bayesian joint distribution based on (19) is $f (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, u_{0 j}, u_{1 j}, θ) f (u_{0 j}, u_{1 j} | θ) f (C_{ij}^{*} | ν_{j}, θ) ϕ (ν_{j}; 0, τ_{ν ν}) p (θ)$ for a prior $p (θ)$ and $θ = θ_{(19)}^{*}$ . Our scientific model is an analytic integral $f_{1} (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, θ) = \int \int f (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, u_{0 j}, u_{1 j}, θ) f (u_{0 j}, u_{1 j} | θ) d u_{0 j} d u_{1 j}$ . The Gibbs sampler of Enders, Du, and Keller (Citation2020) may be modified to be compatible with our scientific model conditional on a latent covariate ν_j by sampling (i) ν_j from a compatible posterior $p (ν_{j} | \cdot) = \frac{\prod_{i} f (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, u_{0 j}, u_{1 j}, θ) f (C_{ij}^{*} | ν_{j}, θ) ϕ (ν_{j}; 0, τ_{ν ν})}{\int \prod_{i} f (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, u_{0 j}, u_{1 j}, θ) f (C_{ij}^{*} | ν_{j}, θ) ϕ (ν_{j}; 0, τ_{ν ν}) d ν_{j}}$ for the denominator approximated by the MCMC integration, and (ii) a missing $C_{ij}^{*}$ from a compatible normal posterior $p (C_{ij}^{*} | \cdot) \propto f (R_{ij}^{*} | C_{ij}^{*}, ν_{j}, u_{0 j}, u_{1 j}, θ) f (C_{ij}^{*} | ν_{j}, θ)$ with $E (C_{ij}^{*} | \cdot) = δ + ν_{j} + \frac{β_{1 j} σ_{cc}}{β_{1 j}^{2} σ_{cc} + σ^{2}} (R_{ij}^{*} - β_{0 j}), var (C_{ij}^{*} | \cdot) = \frac{σ_{cc} σ^{2}}{β_{1 j}^{2} σ_{cc} + σ^{2}}$ by (15) for $β_{0 j} = γ_{00} + γ_{01} ν_{j} + u_{0 j}$ and $β_{1 j} = γ_{10} + γ_{11} ν_{j} + u_{1 j}$ .

Maximum Likelihood Estimation of Hierarchical Linear Models from Incomplete Data: Random Coefficients, Statistical Interactions, and Measurement Error

Abstract

1 Introduction

2 Inference for Multivariate Normal HLMs from Incomplete Data Using Random Effects as Predictors

2.1 Estimation via the EM Algorithm

2.2 Example 1: Contextual Effects Model

2.3 Compatibility

3 Coping with Random Coefficients and Interactions

3.1 Factorization of the Likelihood Based on Provisionally Known Random Effects

3.2 Example 2: A Partially Observed Covariate Having a Random Coefficient

3.3 Example 3: Cross-Level Interaction Effects Involving Partially Observed Covariates

4 Auxiliary Covariates

4.1 Linearly Associated Auxiliary Covariates

4.2 Nonlinearly Associated Auxiliary Covariates

5 Within-Level Interactions and Polynomial Terms

5.1 Rules for Choosing Provisionally Known Random Effects

6 Data Analysis

Table 1 Each variable for analysis with mean (standard deviation (SD), missing %).

6.1 Linearly Associated Auxiliary Covariates

Table 2 Estimates × 100 (standard errors × 100) of model (19) by EM-AGHQ I and II.

6.2 Nonlinearly Associated Auxiliary Covariates

6.3 Known Auxiliary Covariates

7 Simulation Study

Table 3 Scientific model (19) estimated by BM, CC, MLE on MI, GS and EM-AGHQ.

Computation

8 Discussion

Supplementary Materials

README_supplementarymaterials.txt

ShinRaudenbushComputation.zip

Acknowledgments

Disclosure Statement

Additional information

Funding

References

Appendix A

Problem of Bias in Estimating Model (19)

Appendix B

The E Step for estimation of model (14)

Appendix C

Numerical Integration by AGHQ

Appendix D

Translating model (B1) to θ̂(19)*

Appendix E

Compatible Gibbs Sampler without a PKRE

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Translating model (B1) to ${\hat{θ}}_{(19)}^{*}$