421
Views
0
CrossRef citations to date
0
Altmetric
Research Article

“C”ing the light – assessing code comprehension in novice programmers using C code patterns

, , , &
Received 12 Jun 2023, Accepted 07 Feb 2024, Published online: 15 Feb 2024

References

  • Alexandrowicz, R. W. (2011). Statistical and practical significance of the likelihood ratio test of the linear logistic test model versus the Rasch model. Educational Research and Evaluation, 17(5), 335–350. 10.1080/13803611.2011.630522
  • Alexandrowicz, R. W. (2022a). GMX: Extended graphical model checks. A versatile replacement of the plotGOF() function of eRm. Psychological Test and Assessment Modeling, 64(3), 215–225.
  • Alexandrowicz, R. W. (2022b). GMX: Extended graphical model checks of RM/PCM/RSM for multi-group splits [ Computer software manual]. https://osf.io/2ryd8/(R package version 0.8-1)
  • Andersen, E. B. (1973). A goodness of fit test for the rasch model. Psychometrika, 38(1), 123–140. https://doi.org/10.1007/BF02291180
  • Anderson, S., Sommerhoff, D., Schurig, M., Ufer, S., & Gebhardt, M. (2022). Developing learning progress monitoring tests using difficulty-generating item characteristics: An example for basic arithmetic operations in primary schools. Journal for Educational Research Online, 2022(1), 122–146. https://doi.org/10.31244/jero.2022.01.06
  • Ayala, R. J. D. (2022). The Theory and Practice of Item Response Theory (2nd ed.). Guilford Publications .
  • Baghaei, P., & Hohensinn, C. (2017). A method of Q-Matrix validation for the linear logistic test model. Frontiers in Psychology, 8, 897. https://doi.org/10.3389/fpsyg.2017.00897
  • Bauer, J., Siegmund, J., Peitek, N., Hofmeister, J. C., & Apel, S. (2019). Indentation: Simply a matter of style or support for program comprehension? In 2019 IEEE/ACM 27th international conference on program comprehension (ICPC). IEEE. https://doi.org/10.1109/icpc.2019.00033
  • Bergersen, G. R., Sjøberg, D. I., & Dyb°a, T. (2014). Construction and validation of an instrument for measuring programming skill. IEEE Transactions on Software Engineering, 40(12), 1163–1184. https://doi.org/10.1109/TSE.2014.2348997
  • Berges, M., & Hubwieser, P. (2015). Evaluation of source code with item response theory. In Proceedings of the 2015 ACM conference on innovation and technology in computer science education. ACM. https://doi.org/10.1145/2729094.2742619
  • Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.
  • Bockmon, R., Cooper, S., Gratch, J., & Dorodchi, M. (2019). (Re)validating cognitive introductory computing instruments. In Proceedings of the 50th ACM technical symposium on computer science education. ACM. https://doi.org/10.1145/3287324.3287372
  • Brun, Y., Lin, T., Somerville, J. E., Myers, E., & Ebner, N. C. (2023). Blindspots in python and java APIs result in vulnerable code. ACM Transactions on Software Engineering and Methodology, 32(3), 76. https://doi.org/10.1145/3571850
  • Brusilovsky, P., & Sosnovsky, S. (2005). Individualized exercises for self-assessment of programming knowledge: An evaluation of quizpack. Journal on Educational Resources in Computing, 5(3), 6–es. https://doi.org/10.1145/1163405.1163411
  • Bureau of Labor Statistics, U.S. Department of Labor. (2022). Occupational Outlook Handbook. Retrieved September 13, 2022, from https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm
  • Buse, R. P., & Weimer, W. R. (2008). A metric for software readability. In Proceedings of the 2008 international symposium on software testing and analysis. ACM. https://doi.org/10.1145/1390630.1390647
  • Christensen, K. B., Bjorner, J. B., Kreiner, S., & Petersen, J. H. (2002). Testing unidimensionality in polytomous rasch models. Psychometrika, 67(4), 563–574. https://doi.org/10.1007/BF02295131
  • Commons, M. L., Trudeau, E. J., Stein, S. A., Richards, F. A., & Krause, S. R. (1998). Hierarchical complexity of tasks shows the existence of developmental stages. Developmental Review, 18(3), 237–278. https://doi.org/10.1006/drev.1998.0467
  • Cooper, A., & Petrides, K. V. (2010). A psychometric analysis of the trait emotional intelligence questionnaire–short form (TEIQue–SF) using item response theory. Journal of Personality Assessment, 92(5), 449–457. https://doi.org/10.1080/00223891.2010.497426
  • Dancik, G., & Kumar, A. (2003). A tutor for counter-controlled loop concepts and its evaluation. 33rd Annual Frontiers in Education, 2003 Fie 2003, 1, T3C–7. https://doi.org/10.1109/FIE.2003.1263331
  • Davidson, M. J., Wortzman, B., Ko, A. J., & Li, M. (2021). Investigating item bias in a CS1 exam with differential item functioning. In Proceedings of the 52nd ACM technical symposium on computer science education. ACM. https://doi.org/10.1145/3408877.3432397
  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models. Springer New York. https://doi.org/10.1007/978-1-4757-3990-9
  • DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 aspects of the big five. Journal of Personality and Social Psychology, 93(5), 880–896. https://doi.org/10.1037/0022-3514.93.5.880
  • Dolado, J. J., Harman, M., Otero, M. C., & Hu, L. (2003). An empirical investigation of the influence of a type of side effects on program comprehension. IEEE Transactions on Software Engineering, 29(7), 665–670. https://doi.org/10.1109/TSE.2003.1214329
  • Duran, R., Rybicki, J.-M., Sorva, J., & Hellas, A. (2019). Exploring the value of student self-evaluation in introductory programming. In Proceedings of the 2019 ACM conference on international computing education research. ACM. https://doi.org/10.1145/3291279.3339407
  • Duran, R., Sorva, J., & Leite, S. (2018). Towards an analysis of program complexity from a cognitive perspective. In Proceedings of the 2018 ACM conference on international computing education research. ACM. https://doi.org/10.1145/3230977.3230986
  • Elshoff, J. L., & Marcotty, M. (1982). Improving computer program readability to aid modification. Communications of the ACM, 25(8), 512–521. https://doi.org/10.1145/358589.358596
  • Feitelson, D. G. (2023). From code complexity metrics to program comprehension. Communications of the ACM, 66(5), 52–61. https://doi.org/10.1145/3546576
  • Fischer, G. H. (1995a). Derivations of the rasch model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch Models: Foundations, recent developments, and applications (pp. 15–38). Springer.
  • Fischer, G. H. (1995b). The linear logistic test Model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch Models: Foundations, recent developments, and applications (pp. 131–155). Springer New York. https://doi.org/10.1007/978-1-4612-4230-78
  • Fischer, G. H., & Formann, A. K. (1982). Some applications of logistic latent trait models with linear constraints on the parameters. Applied Psychological Measurement, 6(4), 397–416. https://doi.org/10.1177/014662168200600403
  • Fischer, G. H., & Ponocny, I. (1994). An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59(2), 177–192. https://doi.org/10.1007/BF02295182
  • Fowler, M. (2018). Refactoring: Improving the design of existing code (2nd ed.). Pearson International.
  • Glas, C. A. W., & Verhelst, N. D. (1995). Testing the Rasch Model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch Models: Foundations, recent developments, and applications (pp. 69–95). Springer New York. 10.1007/978-1-4612-4230-75
  • Gopstein, D., Fayard, A.-L., Apel, S., & Cappos, J. (2020). Thinking aloud about confusing code: A qualitative investigation of program comprehension and atoms of confusion. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event USA. https://doi.org/10.1145/3368089.3409714
  • Gopstein, D., Iannacone, J., Yan, Y., DeLong, L. A., Zhuang, Y., Yeh, M. K.-C., & Cappos, J. (2017). Understanding misunderstandings in source code. In Proceedings of the 2017 11th joint meeting on foundations of software engineering, Paderborn, Germany. https://doi.org/10.1145/3106237.3106264
  • Gopstein, D., Zhou, H. H., Frankl, P., & Cappos, J. (2018). Prevalence of confusing code in software projects: Atoms of confusion in the wild. In Proceedings of the 15th international conference on mining software repositories (p. 281–291). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3196398.3196432
  • Herman, G. L., Zilles, C., & Loui, M. C. (2014). A psychometric evaluation of the digital logic concept inventory. Computer Science Education, 24(4), 277–303. https://doi.org/10.1080/08993408.2014.970781
  • Hofmeister, J. C., Siegmund, J., & Holt, D. V. (2019). Shorter identifier names take longer to comprehend. Empirical Software Engineering, 24(1), 417–443. https://doi.org/10.1007/s10664-018-9621-x
  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
  • Izu, C., Schulte, C., Aggarwal, A., Cutts, Q., Duran, R., Gutica, M. Heinemann, B., Kraemer, E., Lonati, V., Mirolo, C. and Weeda, R. (2019). Fostering program comprehension in novice programmers – learning activities and learning trajectories. In Proceedings of the working group reports on innovation and technology in computer science education. ACM. https://doi.org/10.1145/3344429.3372501
  • Jones, D. M. (2006). Developer beliefs about binary operator precedence. Open Standards. Retrieved 2023-10-23, from. https://www.open-std.org/jtc1/sc22/wg23/docs/ISO-IECJTC1-SC22-WG23N0036-parenart.pdf
  • Kernighan, B. W., & Pike, R. (1999). The practice of programming (1st ed.). Addison-Wesley Professional.
  • Lister, R., Adams, E., Fitzgerald, S., Fone, W., Hamer, J., Lindholm, M. McCartney, R., Moström, J.E., Sanders, K., Seppälä, O. and Simon, B., Thomas, L. (2004). A multi-national study of reading and tracing skills in novice programmers. ACM SIGCSE Bulletin, 36(4), 119–150. https://doi.org/10.1145/1041624.1041673
  • Lobb, R., & Harlow, J. (2016). Coderunner: A tool for assessing computer programming skills. ACM Inroads, 7(1), 47–51. https://doi.org/10.1145/2810041
  • Luxton-Reilly, A., Becker, B. A., Cao, Y., McDermott, R., Mirolo, C., Mühling, A. Petersen, A., Sanders, K.,and Whalley, J. (2017). Developing assessments to determine mastery of programming fundamentals. In Proceedings of the 2017 ITiCSE conference on working group reports. ACM. https://doi.org/10.1145/3174781.3174784
  • Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science, 49(1), 26–43.
  • Mair, P., & Hatzinger, R. (2007b). Extended rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9). https://doi.org/10.18637/jss.v020.i09
  • Mair, P., Hatzinger, R., & Maier, M. J. (2021). ERm: Extended rasch modeling [computer software manual]. https://cran.r-project.org/package=eRm ( R package Version 1.0-2)
  • Margulieux, L., Ketenci, T. A., & Decker, A. (2019). Review of measurements used in computing education research and suggestions for increasing standardization. Computer Science Education, 29(1), 49–78. https://doi.org/10.1080/08993408.2018.1562145
  • Marshall, L., & Webber, J. (2000). Gotos considered harmful and other programmers’ taboos. In Proceedings of the 12th annual workshop of the psychology of programming interest group (ppig) (pp. 171–177). Memoria. https://ppig.org/files/2000-PPIG-12th-marshall.pdf
  • McDaniel, M. A., & Fisher, R. P. (1991). Tests and test feedback as learning sources. Contemporary Educational Psychology, 16(2), 192–201. https://doi.org/10.1016/0361-476X(91)90037-L
  • McKeithen, K. B., Reitman, J. S., Rueter, H. H., & Hirtle, S. C. (1981). Knowledge organization and skill differences in computer programmers. Cognitive Psychology, 13(3), 307–325. https://doi.org/10.1016/0010-0285(81)90012-8
  • Minelli, R., Mocci, A., & Lanza, M. (2015). I know what you did last summer – an investigation of how developers spend their time. In 2015 IEEE 23rd international conference on program comprehension. IEEE. https://doi.org/10.1109/icpc.2015.12
  • Molenaar, I. W. (1995a). Estimation of item parameters. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 39–51). Springer. https://doi.org/10.1007/978-1-4612-4230-7
  • Molenaar, I. W. (1995b). Some background for item response theory and the rasch model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 3–14). Springer. https://doi.org/10.1007/978-1-4612-4230-7
  • Parker, M. C., Guzdial, M., & Engleman, S. (2016). Replication, validation, and use of a language independent CS1 knowledge assessment. In Proceedings of the 2016 ACM conference on international computing education research. ACM. https://doi.org/10.1145/2960310.2960316
  • Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the rasch model. Psychometrika, 66(3), 437–459. https://doi.org/10.1007/BF02294444
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.
  • R Core Team. (2021). R: A language and environment for statistical computing [computer software manual]. https://www.R-project.org/(RVersion4.2.0)
  • Rigby, L., Denny, P., & Luxton-Reilly, A. (2020). A miss is as good as a mile: Off-by-one errors and arrays in an introductory programming course. In Proceedings of the twenty-second australasian computing education conference (p. 31–38). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3373165.3373169
  • Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software? In Proceedings of the 34th international conference on software engineering (ICSE), Zurich, Switzerland (pp. 255–265). IEEE. https://doi.org/10.1109/icse.2012.6227188
  • Scheiblechner, H. (1971). CML-parameter-estimation in a generalized multifactorial version of Rasch’s probabilistic measurement model with two categories of answers. Department of Psychology, University of Vienna.
  • Scheiblechner, H. (1972). Das Lernen und Lösen komplexer Denkaufgaben [Learning and solving of complex reasoning items]. Zeitschrift für Experimentelle und Angewandte Psychologie, 19, 476–506.
  • Schulte, C. (2008). Block model: An educational model of program comprehension as a tool for a scholarly approach to teaching. In Proceedings of the Fourth International Workshop on Computing Education Research. ACM. https://doi.org/10.1145/1404520.1404535
  • Shaft, T. M., & Vessey, I. (1995). The relevance of application domain knowledge: The case of computer program comprehension. Information Systems Research [ Retrieved 2023-09-27, from], 6(3), 286–299. http://www.jstor.org/stable/23010878
  • Soloway, E. (1986). Learning to program = learning to construct mechanisms and explanations. Communications of the ACM, 29(9), 850–858. https://doi.org/10.1145/6592.6594
  • Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, SE-10(5), 595–609. https://doi.org/10.1109/tse.1984.5010283
  • Stefik, A., & Siebert, S. (2013). An empirical investigation into programming language syntax. ACM Transactions on Computing Education, 13(4), 1–40. https://doi.org/10.1145/2534973
  • Tashtoush, Y., Odat, Z., Alsmadi, I., & Yatim, M. (2013). Impact of programming features on code readability. International Journal of Software Engineering and Its Applications, 7(6), 441–458. https://doi.org/10.14257/ijseia.2013.7.6.38
  • Tew, A. E., & Guzdial, M. (2011). The FCS1. In Proceedings of the 42nd ACM technical symposium on Computer science education. ACM. https://doi.org/10.1145/1953163.1953200
  • van der Linden, W. J. (2016). Handbook of item response theory. Chapman and Hall/CRC. https://doi.org/10.1201/9781315374512
  • van der Linden, W. J., & Glas, C. A. (2010). Elements of adaptive testing. Springer New York. https://doi.org/10.1007/978-0-387-85461-8
  • Vollmeyer, R., & Rheinberg, F. (2005). A surprising effect of feedback on learning. Learning and Instruction, 15(6), 589–602. https://doi.org/10.1016/j.learninstruc.2005.08.001
  • Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E., & Li, S. (2018). Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering, 44(10), 951–976. https://doi.org/10.1109/tse.2017.2734091
  • Xie, B., Davidson, M. J., Li, M., & Ko, A. J. (2019). An item response theory evaluation of a language-independent CS1 knowledge assessment. In Proceedings of the 50th ACM technical symposium on computer science education. ACM. https://doi.org/10.1145/3287324.3287370
  • Zhuang, Y., Yan, Y., DeLong, L. A., & Yeh, M. K. (2023). Do developer perceptions have borders? Comparing C code responses across continents. Software Quality Journal. https://doi.org/10.1007/s11219-023-09654-0