349
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Text Complexity of Chinese Elementary School Textbooks: Analysis of Text Linguistic Features Using Machine Learning Algorithms

, ORCID Icon, & ORCID Icon

References

  • Amendum, S. J., Conradi, K., & Hiebert, E. (2018). Does text complexity matter in the elementary Grades? A research synthesis of text difficulty and elementary students’ reading fluency and comprehension. Educational Psychology Review, 30(1), 121–151. https://doi.org/10.1007/s10648-017-9398-2
  • Aparicio, X., Mégalakaki, O., Drai-Zerbib, V., & Baccino, T. (2022). Comprehension performances of explanatory texts in French language according to their characteristics: Evidence for 1,229 children from 2nd to 9th grade. Scientific Studies of Reading, 26(4), 287–304. https://doi.org/10.1080/10888438.2021.1983819
  • Berendes, K., Vajjala, S., Meurers, D., Bryant, D., Wagner, W., Chinkina, M., & Trautwein, U. (2018). Reading demands in secondary school: Does the linguistic complexity of textbooks increase with grade level and the academic orientation of the school track? Journal of Educational Psychology, 110(4), 518–543. https://doi.org/10.1037/edu0000225
  • Cai, Q., Brysbaert, M., & Rodriguez-Fornells, A. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5(6), e10729. https://doi.org/10.1371/journal.pone.0010729
  • Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.
  • Che, W., Li, Z., & Liu, T. (2010). LTP: A Chinese language technology platform. Proceedings of the Coling 2010 - 23rd International Conference on Computational Linguistics, 13–16. https://dl.acm.org/doi/abs/10.5555/1944284.1944288
  • Cheng, Y., Xu, D. K., & Dong, J. (2020). 基于语文教材语料库的文本阅读难度分级关键因素分析与易读性公式研究 [On key factors of text reading difficulty grading and readability formula based on Chinese textbook corpus]. 语言文字应用, 1, 132–143. https://doi.org/10.16499/j.cnki.1003-5397.2020.01.014
  • Chu, M. M. K., & Leung, M. T. (2005). Reading strategy of Hong Kong school-aged children: The development of word-level and character-level processing. Applied Psycholinguistics, 26(4), 505–520. https://doi.org/10.1017/S0142716405050277
  • Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, 100(4), 589–608. https://doi.org/10.1037/0033-295X.100.4.589
  • Crossley, S. A., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2022). A large-scaled corpus for assessing text readability. Behavior Research Methods, 55(2), 491–507. https://doi.org/10.3758/s13428-022-01802-x
  • Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, 42(3–4), 541–561. https://doi.org/10.1111/1467-9817.12283
  • Crossley, S. A., Skalicky, S., Dascalu, M., McNamara, D. S., & Kyle, K. (2017). Predicting text comprehension, processing, and familiarity in adult readers: New approaches to readability formulas. Discourse Processes, 54(5–6), 340–359. https://doi.org/10.1080/0163853X.2017.1296264
  • De Clercq, O., & Hoste, V. (2016). All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch. Computational Linguistics, 42(3), 457–490. https://doi.org/10.1162/COLI_a_00255
  • De Leeuw, L., Segers, E., & Verhoeven, L. (2016). The effect of student-related and text-related characteristics on student’s reading behaviour and text comprehension: An eye movement study. Scientific Studies of Reading, 20(3), 248–263. https://doi.org/10.1080/10888438.2016.1146285
  • Erbeli, F., He, K., Cheek, C., Rice, M., & Qian, X. (2023). Exploring the machine learning paradigm in determining risk for reading disability. Scientific Studies of Reading, 27(1), 5–20. https://doi.org/10.1080/10888438.2022.2115914
  • Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181. https://dl.acm.org/doi/abs/10.5555/2627435.2697065
  • Fitzgerald, J., Elmore, J., Koons, H., Hiebert, E. H., Bowen, K., Sanford-Moore, E. E., & Stenner, A. J. (2015). Important text characteristics for early-grades text complexity. Journal of Educational Psychology, 107(1), 4–29. https://doi.org/10.1037/a0037289
  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233. https://doi.org/10.1037/h0057532
  • Francis, D. J., Kulesz, P. A., & Benoit, J. S. (2018). Extending the simple view of reading to account for variation within readers and across texts: The complete view of reading (CVRi). Remedial and Special Education, 39(5), 274–288. https://doi.org/10.1177/0741932518772904
  • François, T., & Fairon, C. (2012). An “AI readability” formula for French as a foreignlanguage. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 466–477. https://doi.org/10.5555/2390948.2391004
  • François, T., & Miltsakaki, E. (2012). Do NLP and machine learning improve traditional readability formulas? Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, 49–57. https://doi.org/10.5555/2390916.2390925
  • Goldman, S. R., & Lee, C. D. (2014). Text complexity: State of the art and the conundrums it raises. The Elementary School Journal, 115(2), 290–300. https://doi.org/10.1086/678298
  • Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234. https://doi.org/10.3102/0013189X11413260
  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(7), 1157–1182. https://dl.acm.org/doi/abs/10.5555/944919.944968
  • Hancke, J., Vajjala, S., & Meurers, D. (2012). Readability classification for German using lexical, syntactic, and morphological features. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1063–1080). https://aclanthology.org/C12-1065.pdf
  • Heilman, M. J., Collins, K., Callan, J., & Thompson, M. E. (2007). Combining lexical and grammatical features to improve readability measures for first and second language texts. The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, 460–467. https://aclanthology.org/N07-1058.pdf
  • Ho, C., Fong, C., & Zheng, M. (2019). Contributions of vocabulary and discourse-level skills to reading comprehension among Chinese elementary school children. Applied Psycholinguistics, 40(2), 323–349. https://doi.org/10.1017/S0142716418000590
  • Jin, G. J., Xiao, H., Fu, L., & Zhang, Y. F. (2005). 现代汉语语料库建设及深加工[construction and further processing of Chinese National Corpus]. 语言文字应用, 2, 111–120. https://doi.org/10.16499/j.cnki.1003-5397.2005.02.017
  • Kim, Y.-S. G. (2020). Toward integrative reading science: The direct and indirect effects model of reading. Journal of Learning Disabilities, 53(6), 469–491. https://doi.org/10.1177/0022219420908239
  • Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95(2), 163–182. https://doi.org/10.1037/0033-295X.95.2.163
  • Kuperman, V., Matsuki, K., & Van Dyke, J. A. (2018). Contributions of reader- and text-level characteristics to eye-movement patterns during passage reading. Journal of Experimental Psychology: Learning Memory and Cognition, 44(11), 1687–1713. https://doi.org/10.1037/xlm0000547
  • Li, H., Liu, M., Wang, X., Fan, Y., Li, Y., & Wu, X. (2021). 统编小学语文教科书识字表效用分析 [Analysis of the utility of the Chinese-character lists enclosed in the unified Chinese textbooks for primary school]. 课程教材教法, 41(5), 61–66.
  • Liu, M., Li, Y., Wang, X., Gan, L., & Li, H. (2021). 分级阅读初探: 基于小学教材的汉语可读性公式研究[Leveled reading for primary students: Construction and evaluation of Chinese readability formulas based on textbooks]. 语言文字应用, 2, 116–126. https://doi.org/10.16499/j.cnki.1003-5397.2021.02.010
  • Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39(2), 192–198. https://doi.org/10.3758/BF03193147
  • Liu, Q., Zhang, H. P., Yu, H. K., & Cheng, X. Q. (2004). 基于层叠隐马模型的汉语词法分析 [Chinese lexical analysis using cascaded hidden Markov model]. 计算机研究与发展, 41(8), 1421–1429.
  • Li, T., Wang, Y., Tong, X., & McBride, C. (2017). A developmental study of Chinese children’s word and character reading. Journal of Psycholinguistic Research, 46(1), 141–155. https://doi.org/10.1007/s10936-016-9429-z
  • Matsuki, K., Kuperman, V., & Van Dyke, J. A. (2016). The Random Forests statistical technique: An examination of its value for the study of reading. Scientific Studies of Reading, 20(1), 20–33. https://doi.org/10.1080/10888438.2015.1107073
  • McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22(3), 247–288. https://doi.org/10.1080/01638539609544975
  • Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H. (2012). Toward a theoretical model of text complexity for the early grades: Learning from the past, anticipating the future. Reading Research Quarterly, 47(3), 235–258. https://doi.org/10.1002/rrq.019
  • The Ministry of Education of China. (2012) . Chinese curricula standards during the period of compulsory education. Beijing Normal University Press.
  • Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of text difficulty: Testing their predictive value for grade levels and student performance (Technical Report to the Gates Foundation).
  • O’brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690. https://doi.org/10.1007/s11135-006-9018-6
  • O’Connor, R. E., Bell, K. M., Harty, K. R., Larkin, L. K., Sackor, S. M., & Zigmond, N. (2002). Teaching reading to poor readers in the intermediate grades: A comparison of text difficulty. Journal of Educational Psychology, 94(3), 474–485. https://doi.org/10.1037/0022-0663.94.3.474
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.5555/1953048.2078195
  • Phani, S., Lahiri, S., & Biswas, A. (2019). Readability analysis of Bengali literary texts. Journal of Quantitative Linguistics, 26(4), 287–305. https://doi.org/10.1080/09296174.2018.1499456
  • Psyridou, M., Tolvanen, A., Patel, P., Khanolainen, D., Lerkkanen, M.-K., Poikkeus, A. M., & Torppa, M. (2023). Reading difficulties identification: A comparison of neural networks, linear, and mixture models. Scientific Studies of Reading, 27(1), 39–66. https://doi.org/10.1080/10888438.2022.2095281
  • RAND Reading Study Group. (2002) . Reading for understanding: Toward an R&D program in reading comprehension. RAND Publications.
  • Saha, N. M., Cutting, L. E., Del Tufo, S., & Bailey, S. (2020). Initial validation of a measure of decoding difficulty as a unique predictor of miscues and passage reading fluency. Reading and Writing, 34(2), 497–527. https://doi.org/10.1007/s11145-020-10073-x
  • Sinclair, J., Jang, E. E., & Rudzicz, F. (2021). Using machine learning to predict children’s reading comprehension from linguistic features extracted from speech and writing. Journal of Educational Psychology, 113(6), 1088–1106. https://doi.org/10.1037/edu0000658
  • Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307–322. https://pubmed.ncbi.nlm.nih.gov/16807496/
  • Strain, E., Patterson, K., & Seidenberg, M. S. (2002). Theories of word naming interact with spelling-sound consistency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(1), 207–214. https://doi.org/10.1037/0278-7393.28.1.207
  • Su, X. (2019). A list of common words used in primary schools. Commercial Press.
  • Su, Y., Li, Y., & Li, H. (2022a). Familiarity ratings for 24,325 simplified Chinese words. Behavior Research Methods, 55(3), 1496–1509. https://doi.org/10.3758/s13428-022-01878-5
  • Su, Y., Li, Y., & Li, H. (2022b). Imageability ratings for 10,426 Chinese two-character words and their contribution to lexical processing. Current Psychology. https://doi.org/10.1007/s12144-022-03404-4
  • Sung, Y. T., Chen, J. L., Cha, J. H., Tseng, H. C., Chang, T. H., & Chang, K. E. (2015). Constructing and validating readability models: The method of integrating multilevel linguistic features with machine learning. Behavior Research Methods, 47(2), 340–354. https://doi.org/10.3758/s13428-014-0459-x
  • Tortorelli, L. S. (2020). Beyond first grade: Examining word, sentence, and discourse text factors associated with oral reading rate in informational text in second grade. Reading and Writing, 33(1), 143–170. https://doi.org/10.1007/s11145-019-09956-5
  • Treptow, M. A., Burns, M. K., & McComas, J. J. (2007). Reading at the frustration, instructional, and independent levels: The effects on students’ reading comprehension and time on task. School Psychology Review, 36(1), 159–166. https://doi.org/10.1080/02796015.2007.12087958
  • Tsang, Y. K., Huang, J., Lui, M., Xue, M., Chan, Y. W. F., Wang, S., & Chen, H. C. (2018). MELD-SCH: A megastudy of lexical decision in simplified Chinese. Behavior Research Methods, 50(5), 1763–1777. https://doi.org/10.3758/s13428-017-0944-0
  • Tseng, H. C., Chen, B., Chang, T. H., & Sung, Y. T. (2019). Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts. Natural Language Engineering, 25(3), 331–361. https://doi.org/10.1017/S1351324919000093
  • Wang, L. (2017). 初中级日韩学习者汉语文本可读性公式研究 [Research on Chinese readability formula of texts for elementary and intermediate South Korean and Japanese learners]. 语言教学与研究, 5, 15–25.
  • Wang, W. B., & Zhao, Y. (2017). 论汉语流水句的句类属性 [On the syntactic categorization of Chinese run-on sentences]. 世界汉语教学, 31(2), 171–180. https://doi.org/10.13724/j.cnki.ctiw.2017.02.002
  • Wu, S. Y., Yu, D., & Jiang, X. (2020). 汉语文本可读性特征体系构建和效度验证[development of linguistic features system for Chinese text readability assessment and its validity verification]. 世界汉语教学, 1, 81–97. https://doi.org/10.13724/j.cnki.ctiw.20200103.007
  • Xu, X., Li, J., & Ferrand, L. (2020). Concreteness/Abstractness ratings for two-character Chinese words in MELD-SCH. PLoS ONE, 15(6), e0232133. https://doi.org/10.1371/journal.pone.0232133
  • Xu, X., Li, J., & Guo, S. (2021). Age of acquisition ratings for 19,716 simplified Chinese words. Behavior Research Methods, 53(2), 558–573. https://doi.org/10.3758/s13428-020-01455-8
  • Ye, Z., Luo, Y., Friederici, A. D., & Zhou, X. (2006). Semantic and syntactic processing in Chinese sentence comprehension: Evidence from event-related potentials. Brain Research, 1071(1), 186–196. https://doi.org/10.1016/j.brainres.2005.11.085
  • Ziegler, J. C., & Goswami, U. (2005). Reading acquisition, developmental dyslexia, and skilled reading across languages: A psycholinguistic grain size theory. Psychological Bulletin, 131(1), 3–29. https://doi.org/10.1037/0033-2909.131.1.3
  • Zuo, H., & Zhu, Y. (2014). 中级欧美留学生汉语文本可读性公式研究 [Research on Chinese readability formula of texts for intermediate level European and American students]. 世界汉语教学, 2, 263–276. https://doi.org/10.13724/j.cnki.ctiw.2014

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.