195
Views
0
CrossRef citations to date
0
Altmetric
Methodological Studies

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

Pages 391-421 | Received 25 Aug 2022, Accepted 09 Mar 2023, Published online: 24 Apr 2023

References

  • Bauer, D. J., & Curran, P. J. (2016). The discrepancy between measurement and modeling in longitudinal data analysis. In J. R. Harring, L. M. Stapleton, & S. N. Beretvas (Eds.), Advances in multilevel modeling for educational research (pp. 3–38). Information Age Publishing.
  • Bauer, D. J., Howard, A. L., Baldasaro, R. E., Curran, P. J., Hussong, A. M., Chassin, L., & Zucker, R. A. (2013). A trifactor model for integrating ratings across multiple informants. Psychological Methods, 18(4), 475–493. https://doi.org/10.1037/a0032475
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
  • Cai, L., Choi, K., & Kuhfeld, M. (2016). On the role of multilevel item response models in multi-site evaluation studies for serious games. In H. F. O’Neil, E. L. Baker, & R. Perez (Eds.), Using games and simulations for teaching and assessment. Taylor & Francis.
  • Cai, L. (2010a). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335. https://doi.org/10.3102/1076998609353115
  • Cai, L. (2010b). A two-tier full-information item factor analysis model with applications. Psychometrika, 75(4), 581–612. https://doi.org/10.1007/s11336-010-9178-0
  • Cai, L. (2020). flexMIRT version 3: Flexible multilevel multidimensional item analysis and test scoring. Vector Psychometric Group.
  • Cai, L., & Houts, C. R. (2021). Longitudinal analysis of patient-reported outcomes in clinical trials: Applications of multilevel and multidimensional item response theory. Psychometrika, 86(3), 754–777. https://doi.org/10.1007/s11336-021-09777-y
  • Camilli, G., Vargas, S., Ryan, S., & Barnett, W. S. (2010). Meta-analysis of the effects of early education interventions on cognitive and social development. Teachers College Record: The Voice of Scholarship in Education, 112(3), 579–620. https://doi.org/10.1177/016146811011200303
  • Curran, P. J., Cole, V., Bauer, D. J., Hussong, A. M., & Gottfredson, N. (2016). Improving factor score estimation through the use of observed background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 827–844. https://doi.org/10.1080/10705511.2016.1220839
  • Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
  • Gorter, R., Fox, J. P., Apeldoorn, A., & Twisk, J. (2016). Measurement model choice influenced randomized controlled trial results. Journal of Clinical Epidemiology, 79, 140–149. https://doi.org/10.1016/j.jclinepi.2016.06.011
  • Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87. https://doi.org/10.3102/0162373707299706
  • Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Research Paper. MET Project. Bill & Melinda Gates Foundation.
  • Kuhfeld, M., & Soland, J. (2022). Avoiding bias from sum scores in growth estimates: An examination of IRT-based approaches to scoring longitudinal survey responses. Psychological Methods, 27(2), 234–260. https://doi.org/10.1037/met0000367
  • Lechner, M. (2010). The estimation of causal effects by difference-in-difference methods. Foundations and Trends® in Econometrics, 4(3), 165–224. https://doi.org/10.1561/0800000014
  • Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society: Series B (Methodological), 34(1), 1–18. https://doi.org/10.1111/j.2517-6161.1972.tb00885.x
  • McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior Research Methods, 52(6), 2287–2305. https://doi.org/10.3758/s13428-020-01398-0
  • Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131–154. https://doi.org/10.2307/1165166
  • Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457
  • Pollack, J. M., Najarian, M., Rock, D. A., & Atkins-Burnett, S. (2005). Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K). Psychometric Report for the Fifth Grade. NCES 2006-036. National Center for Education Statistics.
  • Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. The Journal of Applied Psychology, 87(3), 517–529. https://doi.org/10.1037/0021-9010.87.3.517
  • Rhemtulla, M., van Bork, R., & Borsboom, D. (2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45. https://doi.org/10.1037/met0000220
  • Sahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Science: Theory & Practice, 17(1), 321–335.
  • Soland, J. (2017). Is teacher value added a matter of scale? The practical consequences of treating an ordinal scale as interval for estimation of teacher effects. Applied Measurement in Education, 30(1), 52–70. https://doi.org/10.1080/08957347.2016.1247844
  • Soland, J. (2021). Is measurement noninvariance a threat to inferences drawn from randomized control trials? Evidence from empirical and simulation studies. Applied Psychological Measurement, 45(5), 346–360. https://doi.org/10.1177/01466216211013102
  • Soland, J. (2022). Evidence that selecting an appropriate item response theory-based approach to scoring surveys can help avoid biased treatment effect estimates. Educational and Psychological Measurement, 82(2), 376–403. https://doi.org/10.1177/00131644211007551
  • Soland, J., Johnson, A., & Talbert, E. (2022). Regression discontinuity designs in a latent variable framework. Psychological Methods. https://doi.org/10.1037/met0000453
  • Soland, J., & Kuhfeld, M. (2021). Do response styles affect estimates of growth on social-emotional constructs? Evidence from four years of longitudinal survey scores. Multivariate Behavioral Research, 56(6), 853–873. https://doi.org/10.1080/00273171.2020.1778440
  • Soland, J., Kuhfeld, M., & Edwards, K. (2022). How survey scoring decisions can influence your study’s results: A trip through the IRT looking glass. Psychological Methods. https://doi.org/10.1037/met0000506
  • Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In Test scoring (pp. 85–152). Routledge.
  • Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., Hagedorn, M. C., Daly, P., & Najarian, M. (2015). Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011). User’s Manual for the ECLS-K: 2011 Kindergarten Data File and Electronic Codebook, Public Version. NCES 2015-074. National Center for Education Statistics.
  • Vector Psychometric Group. (2021). flexMIRT frequently asked questions. https://vpgcentral.com/software/flexmirt/support/frequently-asked-questions/
  • Whitney, C. R., & Candelaria, C. A. (2017). The effects of no child left behind on children’s socioemotional outcomes. AERA Open, 3(3), 233285841772632. https://doi.org/10.1177/2332858417726324
  • Willoughby, M. T., Blair, C. B., Wirth, R. J., & Greenberg, M. (2012). The measurement of executive function at age 5: Psychometric properties and relationship to academic achievement. Psychological Assessment, 24(1), 226–239. https://doi.org/10.1037/a0025361
  • Wolf, B., & Harbatkin, E. (2022). Making sense of effect sizes: systematic differences in intervention effect sizes by outcome measure type. Journal of Research on Educational Effectiveness, 16(1), 134–161. https://doi.org/10.1080/19345747.2022.2071364
  • Wolf, R. (2021). Average differences in effect sizes by outcome measure type. Retrieved from https://files.eric.ed.gov/fulltext/ED610568.pdf
  • Yeager, D. S., & Walton, G. M. (2011). Social-psychological interventions in education: They’re not magic. Review of Educational Research, 81(2), 267–301. https://doi.org/10.3102/0034654311405999
  • Zopluoğlu, C. (2012). A cross-national comparison of intra-class correlation coefficient in educational achievement outcomes. Journal of Measurement and Evaluation in Education and Psychology, 3(1), 242–278.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.