195
Views
0
CrossRef citations to date
0
Altmetric
Methodological Studies

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

Pages 391-421 | Received 25 Aug 2022, Accepted 09 Mar 2023, Published online: 24 Apr 2023
 

Abstract

When randomized control trials are not possible, quasi-experimental methods often represent the gold standard. One quasi-experimental method is difference-in-difference (DiD), which compares changes in outcomes before and after treatment across groups to estimate a causal effect. DiD researchers often use fairly exhaustive robustness checks to make sure the assumptions of the DiD are met. However, less thought is often put into the approach to score item responses from the outcome measure used. For example, surveys are often scored by adding up the item responses to produce sum scores, and achievement tests often rely on scores produced by test vendors, which frequently employ a unidimensional item response theory (IRT) scoring model that implicitly assumes control and treatment participants are exchangeable (i.e., that they come from the same distribution). In this study, several IRT models that parallel the DiD design in terms of groups and timepoints are presented, and their performance is examined. Results indicate that using a scoring approach that parallels the DiD study design reduces bias and improves power, though these approaches can also lead to increased Type I error rates.

Notes

1 Another reason is that more complex, multidimensional models often require use of a Bayesian scoring approach because MLE is less feasible (Vector Psychometric Group, Citation2021).

2 Readers may wonder why we did not also fit the Rasch model, which assumes all discrimination parameters are equal and, given how the data were generated, could also potentially represent measurement model misspecification. The main reason is that sum scores also violate the same assumption, namely that all items should be given the same weight. This similarity is one reason that the sum score is a sufficient statistic for the Rasch model. While a Rasch model used in conjunction with EAP scoring would introduce a shrinkage issue (discussed later in the study) that could affect results in a way different from sum scores, similar shrinkage occurs when using a unidimensional 2PL model. Thus, the Rasch model’s contribution to the present study is not sufficiently different from that of sum scores, nor the 2PL model, to merit inclusion.

3 This equation was modified slightly dependent on how different timepoints and groups were treated in the IRT model.

4 For parsimony, the figure shows only the consistent discrimination parameter and variable difficulty conditions at a sample size of 1,000; in general, treatment effect recovery was not sensitive to those conditions.

5 In the figure, one may observe that the Type I error rate with a true gain of 0 SDs and using true scores is only 1% compared to the anticipated 5%. To investigate this issue, more simulations were conducted. Specifically, 10,000 true score replications with a sample size of 5,000 were added. When the post-test means using true scores, as well as sum scores, are compared using a t-test (and true difference of zero), the Type I error rate is almost exactly .05.

6 While the present study used 50 students per cluster, schools in ECLS-K would often have as few as 20 students per cluster.

7 Technically, Whitney and Candelaria (Citation2017) used mean scores, not sum scores.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 302.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.