Search in:

Advanced search

Journal of Research on Educational Effectiveness Volume 17, 2024 - Issue 2

Submit an article Journal homepage

195

Views

CrossRef citations to date

Altmetric

Methodological Studies

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

James Solanda School of Education and Human Development, University of Virginia, Charlottesville, Virginia, USA;b NWEA, Portland, Oregon, USACorrespondence[email protected]

Pages 391-421 | Received 25 Aug 2022, Accepted 09 Mar 2023, Published online: 24 Apr 2023

Cite this article
https://doi.org/10.1080/19345747.2023.2195413
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Bauer, D. J., & Curran, P. J. (2016). The discrepancy between measurement and modeling in longitudinal data analysis. In J. R. Harring, L. M. Stapleton, & S. N. Beretvas (Eds.), Advances in multilevel modeling for educational research (pp. 3–38). Information Age Publishing.
Google Scholar
Bauer, D. J., Howard, A. L., Baldasaro, R. E., Curran, P. J., Hussong, A. M., Chassin, L., & Zucker, R. A. (2013). A trifactor model for integrating ratings across multiple informants. Psychological Methods, 18(4), 475–493. https://doi.org/10.1037/a0032475
PubMed Web of Science ®Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Web of Science ®Google Scholar
Cai, L., Choi, K., & Kuhfeld, M. (2016). On the role of multilevel item response models in multi-site evaluation studies for serious games. In H. F. O’Neil, E. L. Baker, & R. Perez (Eds.), Using games and simulations for teaching and assessment. Taylor & Francis.
Google Scholar
Cai, L. (2010a). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335. https://doi.org/10.3102/1076998609353115
Web of Science ®Google Scholar
Cai, L. (2010b). A two-tier full-information item factor analysis model with applications. Psychometrika, 75(4), 581–612. https://doi.org/10.1007/s11336-010-9178-0
Web of Science ®Google Scholar
Cai, L. (2020). flexMIRT version 3: Flexible multilevel multidimensional item analysis and test scoring. Vector Psychometric Group.
Google Scholar
Cai, L., & Houts, C. R. (2021). Longitudinal analysis of patient-reported outcomes in clinical trials: Applications of multilevel and multidimensional item response theory. Psychometrika, 86(3), 754–777. https://doi.org/10.1007/s11336-021-09777-y
PubMed Web of Science ®Google Scholar
Camilli, G., Vargas, S., Ryan, S., & Barnett, W. S. (2010). Meta-analysis of the effects of early education interventions on cognitive and social development. Teachers College Record: The Voice of Scholarship in Education, 112(3), 579–620. https://doi.org/10.1177/016146811011200303
Web of Science ®Google Scholar
Curran, P. J., Cole, V., Bauer, D. J., Hussong, A. M., & Gottfredson, N. (2016). Improving factor score estimation through the use of observed background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 827–844. https://doi.org/10.1080/10705511.2016.1220839
PubMed Web of Science ®Google Scholar
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
Web of Science ®Google Scholar
Gorter, R., Fox, J. P., Apeldoorn, A., & Twisk, J. (2016). Measurement model choice influenced randomized controlled trial results. Journal of Clinical Epidemiology, 79, 140–149. https://doi.org/10.1016/j.jclinepi.2016.06.011
PubMed Web of Science ®Google Scholar
Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87. https://doi.org/10.3102/0162373707299706
Web of Science ®Google Scholar
Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Research Paper. MET Project. Bill & Melinda Gates Foundation.
Google Scholar
Kuhfeld, M., & Soland, J. (2022). Avoiding bias from sum scores in growth estimates: An examination of IRT-based approaches to scoring longitudinal survey responses. Psychological Methods, 27(2), 234–260. https://doi.org/10.1037/met0000367
PubMed Web of Science ®Google Scholar
Lechner, M. (2010). The estimation of causal effects by difference-in-difference methods. Foundations and Trends® in Econometrics, 4(3), 165–224. https://doi.org/10.1561/0800000014
Google Scholar
Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society: Series B (Methodological), 34(1), 1–18. https://doi.org/10.1111/j.2517-6161.1972.tb00885.x
Web of Science ®Google Scholar
McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior Research Methods, 52(6), 2287–2305. https://doi.org/10.3758/s13428-020-01398-0
PubMed Web of Science ®Google Scholar
Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131–154. https://doi.org/10.2307/1165166
Google Scholar
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457
Web of Science ®Google Scholar
Pollack, J. M., Najarian, M., Rock, D. A., & Atkins-Burnett, S. (2005). Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K). Psychometric Report for the Fifth Grade. NCES 2006-036. National Center for Education Statistics.
Google Scholar
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. The Journal of Applied Psychology, 87(3), 517–529. https://doi.org/10.1037/0021-9010.87.3.517
PubMed Web of Science ®Google Scholar
Rhemtulla, M., van Bork, R., & Borsboom, D. (2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45. https://doi.org/10.1037/met0000220
PubMed Web of Science ®Google Scholar
Sahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Science: Theory & Practice, 17(1), 321–335.
Google Scholar
Soland, J. (2017). Is teacher value added a matter of scale? The practical consequences of treating an ordinal scale as interval for estimation of teacher effects. Applied Measurement in Education, 30(1), 52–70. https://doi.org/10.1080/08957347.2016.1247844
Web of Science ®Google Scholar
Soland, J. (2021). Is measurement noninvariance a threat to inferences drawn from randomized control trials? Evidence from empirical and simulation studies. Applied Psychological Measurement, 45(5), 346–360. https://doi.org/10.1177/01466216211013102
PubMed Web of Science ®Google Scholar
Soland, J. (2022). Evidence that selecting an appropriate item response theory-based approach to scoring surveys can help avoid biased treatment effect estimates. Educational and Psychological Measurement, 82(2), 376–403. https://doi.org/10.1177/00131644211007551
PubMed Web of Science ®Google Scholar
Soland, J., Johnson, A., & Talbert, E. (2022). Regression discontinuity designs in a latent variable framework. Psychological Methods. https://doi.org/10.1037/met0000453
Web of Science ®Google Scholar
Soland, J., & Kuhfeld, M. (2021). Do response styles affect estimates of growth on social-emotional constructs? Evidence from four years of longitudinal survey scores. Multivariate Behavioral Research, 56(6), 853–873. https://doi.org/10.1080/00273171.2020.1778440
PubMed Web of Science ®Google Scholar
Soland, J., Kuhfeld, M., & Edwards, K. (2022). How survey scoring decisions can influence your study’s results: A trip through the IRT looking glass. Psychological Methods. https://doi.org/10.1037/met0000506
Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In Test scoring (pp. 85–152). Routledge.
Google Scholar
Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., Hagedorn, M. C., Daly, P., & Najarian, M. (2015). Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011). User’s Manual for the ECLS-K: 2011 Kindergarten Data File and Electronic Codebook, Public Version. NCES 2015-074. National Center for Education Statistics.
Google Scholar
Vector Psychometric Group. (2021). flexMIRT frequently asked questions. https://vpgcentral.com/software/flexmirt/support/frequently-asked-questions/
Google Scholar
Whitney, C. R., & Candelaria, C. A. (2017). The effects of no child left behind on children’s socioemotional outcomes. AERA Open, 3(3), 233285841772632. https://doi.org/10.1177/2332858417726324
Web of Science ®Google Scholar
Willoughby, M. T., Blair, C. B., Wirth, R. J., & Greenberg, M. (2012). The measurement of executive function at age 5: Psychometric properties and relationship to academic achievement. Psychological Assessment, 24(1), 226–239. https://doi.org/10.1037/a0025361
PubMed Web of Science ®Google Scholar
Wolf, B., & Harbatkin, E. (2022). Making sense of effect sizes: systematic differences in intervention effect sizes by outcome measure type. Journal of Research on Educational Effectiveness, 16(1), 134–161. https://doi.org/10.1080/19345747.2022.2071364
Web of Science ®Google Scholar
Wolf, R. (2021). Average differences in effect sizes by outcome measure type. Retrieved from https://files.eric.ed.gov/fulltext/ED610568.pdf
Google Scholar
Yeager, D. S., & Walton, G. M. (2011). Social-psychological interventions in education: They’re not magic. Review of Educational Research, 81(2), 267–301. https://doi.org/10.3102/0034654311405999
Web of Science ®Google Scholar
Zopluoğlu, C. (2012). A cross-national comparison of intra-class correlation coefficient in educational achievement outcomes. Journal of Measurement and Evaluation in Education and Psychology, 3(1), 242–278.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date