Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble): Journal of Research on Educational Effectiveness: Vol 17, No 2

195

Views

CrossRef citations to date

Altmetric

Abstract

When randomized control trials are not possible, quasi-experimental methods often represent the gold standard. One quasi-experimental method is difference-in-difference (DiD), which compares changes in outcomes before and after treatment across groups to estimate a causal effect. DiD researchers often use fairly exhaustive robustness checks to make sure the assumptions of the DiD are met. However, less thought is often put into the approach to score item responses from the outcome measure used. For example, surveys are often scored by adding up the item responses to produce sum scores, and achievement tests often rely on scores produced by test vendors, which frequently employ a unidimensional item response theory (IRT) scoring model that implicitly assumes control and treatment participants are exchangeable (i.e., that they come from the same distribution). In this study, several IRT models that parallel the DiD design in terms of groups and timepoints are presented, and their performance is examined. Results indicate that using a scoring approach that parallels the DiD study design reduces bias and improves power, though these approaches can also lead to increased Type I error rates.

Keywords:

Notes

1 Another reason is that more complex, multidimensional models often require use of a Bayesian scoring approach because MLE is less feasible (Vector Psychometric Group, Citation2021).

2 Readers may wonder why we did not also fit the Rasch model, which assumes all discrimination parameters are equal and, given how the data were generated, could also potentially represent measurement model misspecification. The main reason is that sum scores also violate the same assumption, namely that all items should be given the same weight. This similarity is one reason that the sum score is a sufficient statistic for the Rasch model. While a Rasch model used in conjunction with EAP scoring would introduce a shrinkage issue (discussed later in the study) that could affect results in a way different from sum scores, similar shrinkage occurs when using a unidimensional 2PL model. Thus, the Rasch model’s contribution to the present study is not sufficiently different from that of sum scores, nor the 2PL model, to merit inclusion.

3 This equation was modified slightly dependent on how different timepoints and groups were treated in the IRT model.

4 For parsimony, the figure shows only the consistent discrimination parameter and variable difficulty conditions at a sample size of 1,000; in general, treatment effect recovery was not sensitive to those conditions.

5 In the figure, one may observe that the Type I error rate with a true gain of 0 SDs and using true scores is only 1% compared to the anticipated 5%. To investigate this issue, more simulations were conducted. Specifically, 10,000 true score replications with a sample size of 5,000 were added. When the post-test means using true scores, as well as sum scores, are compared using a t-test (and true difference of zero), the Type I error rate is almost exactly .05.

6 While the present study used 50 students per cluster, schools in ECLS-K would often have as few as 20 students per cluster.

7 Technically, Whitney and Candelaria (Citation2017) used mean scores, not sum scores.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

Information for

Open access

Opportunities

Help and information

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

Abstract

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature