255
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An Examination of the Linking Error Currently Used in PISA

ORCID Icon & ORCID Icon

References

  • Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.
  • Cai, L., & Moustaki, I. (2018). Estimation methods in latent variable models for categorical outcome variables. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test (pp. 253–277). Wiley. https://doi.org/10.1002/9781118489772.ch9
  • Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (pp. 221–256). Praeger Publisher.
  • Carstensen, C. H. (2013). Linking PISA competencies over three cycles – results from Germany. In M. Prenzel, M. Kobarg, K. Schöps, & S. Rönnebeck (Eds.), Research on PISA (pp. 199–213). Springer. https://doi.org/10.1007/978-94-007-4458-5_12
  • Dorans, N. J., Pommerich, M., & Holland, P. W. (Eds.). (2007). Linking and aligning scores and scales. Springer. https://doi.org/10.1007/978-0-387-49771-6
  • Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. https://doi.org/10.1177/0146621602026001001
  • Joo, S. H., Khorramdel, L., Yamamoto, K., Shin, H. J., & Robin, F. (2021). Evaluating item fit statistic thresholds in PISA: Analysis of cross-country comparability of cognitive items. Educational Measurement: Issues and Practice, 40(2), 37–48. https://doi.org/10.1111/emip.12404
  • Köhler, C., Robitzsch, A., & Hartig, J. (2020). A bias-corrected RMSD item fit statistic: An evaluation and comparison to alternatives. Journal of Educational and Behavioral Statistics, 45(3), 251–273. https://doi.org/10.3102/1076998619890566
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. Springer. https://doi.org/10.1007/978-1-4939-0317-7
  • Martin, M. O., Mullis, I. V. S., Foy, P., Brossman, B., & Stanco, G. M. (2012). Estimating linking error in PIRLS. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 5, 35–47. https://bit.ly/2Vx3el8
  • Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–121). Springer. https://doi.org/10.1007/978-1-4757-2691-6_6
  • Mazzeo, J., & von Davier, M. (2014). Linking scales in international large-scale assessment. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment (pp. 229–258). CRC Press.
  • Meade, A. W., Lautenschlager, G. J., & Hecht, J. E. (2005). Establishing measurement equivalence and invariance in longitudinal data with item response theory. International Journal of Testing, 5(3), 279–300. https://doi.org/10.1207/s15327574ijt0503_6
  • Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335. https://bit.ly/2WDPeqD
  • Muraki, E. (1997). A generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). Springer. https://doi.org/10.1007/978-1-4757-2691-6_9
  • OECD. (2014). PISA 2012 technical report. OECD Publishing. https://www.oecd.org/pisa/pisaproducts/pisa2012technicalreport.htm
  • OECD. (2018). PISA 2015 technical report. OECD Publishing. https://www.oecd.org/pisa/data/2015-technical-report/
  • OECD. (2020). PISA 2018 technical report. OECD Publishing. https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. https://bit.ly/3k4K9kt
  • Oliveri, M. E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14, 1–21. https://doi.org/10.1080/15305058.2013.825265
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.
  • R Core Team. (2022). R: A language and environment for statistical computing. https://www.R-project.org/
  • Robitzsch, A. (2021). Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry, 13(11), 2198. https://doi.org/10.3390/sym13112198
  • Robitzsch, A. (2022). Statistical properties of estimators of the RMSD item fit statistic. Foundations, 2(2), 488–503. https://doi.org/10.3390/foundations2020032
  • Robitzsch, A. (2023). Linking error in the 2PL model. J, 6(1), 58–84. https://doi.org/10.3390/j6010005
  • Robitzsch, A., Kiefer, T., & Wu, M. (2022). TAM: Test analysis modules. R package version 4. 1–4. http://CRAN.R-project.org/package=TAM
  • Robitzsch, A., & Lüdtke, O. (2019). Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assessment in Education: Principles, Policy & Practice, 26(4), 444–465. https://doi.org/10.1080/0969594X.2018.1433633
  • Robitzsch, A., & Lüdtke, O. (2020). A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychological Test and Assessment Modeling, 62(2), 233–279. https://bit.ly/3ezBB05
  • Robitzsch, A., & Lüdtke, O. (2022a). Comparing different trend estimation approaches in international large-scale assessment studies. OSF Preprints. Retrieved November 12, 2022, from https://doi.org/10.31219/osf.io/u8kf5
  • Robitzsch, A., & Lüdtke, O. (2022b). Mean comparisons of many groups in the presence of DIF: An evaluation of linking and concurrent scaling approaches. Journal of Educational and Behavioral Statistics, 47(1), 36–68. https://doi.org/10.3102/10769986211017479
  • Sachse, K. A., Roppelt, A., & Haag, N. (2016). A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. Journal of Educational Measurement, 53(2), 152–171. https://doi.org/10.1111/jedm.12106
  • Tijmstra, J., Bolsinova, M., Liaw, Y. L., Rutkowski, L., & Rutkowski, D. (2020). Sensitivity of the RMSD for detecting item-level misfit in low-performing countries. Journal of Educational Measurement, 57(4), 566–583. https://doi.org/10.1111/jedm.12263
  • von Davier, M., & Sinharay, S. (2014). Analytics in international large-scale assessments: Item response theory and population models. In L. Rutkowski, L. M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 155–174). CRC Press. https://doi.org/10.1201/b16061-12
  • von Davier, M., Yamamoto, K., Shin, H. J., Chen, H., Khorramdel, L., Weeks, J., Davis, S., Kong, N., & Kandathil, M. (2019). Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assessment in Education: Principles, Policy & Practice, 26(4), 466–488. https://doi.org/10.1080/0969594X.2019.1586642
  • Wu, M. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement: Issues and Practice, 29, 15–27. https://doi.org/10.1111/j.1745-3992.2010.00190.x
  • Xu, X., & von Davier, M. (2010). Linking errors in trend estimation in large-scale surveys: A case study. ETS Research Report ETS RR10-10. ETS. https://doi.org/10.1002/j.2333-8504.2010.tb02217.x