Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Scientific Studies of Reading
List of Issues
Volume 28, Issue 3
Text Complexity of Chinese Elementary Sc ....

Search in:

Advanced search

Scientific Studies of Reading Volume 28, 2024 - Issue 3

Submit an article Journal homepage

349

Views

CrossRef citations to date

Altmetric

Research Article

Text Complexity of Chinese Elementary School Textbooks: Analysis of Text Linguistic Features Using Machine Learning Algorithms

Miaomiao Liua Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Institute of Children’s Reading and Learning, Faculty of Psychology, Beijing Normal University, Haidian, Beijing, ChinaView further author information

Yixun Lib Department of Early Childhood Education, The Education University of Hong Kong, Hong Kong, China

https://orcid.org/0000-0003-2193-6126 View further author information

Yongqiang Sua Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Institute of Children’s Reading and Learning, Faculty of Psychology, Beijing Normal University, Haidian, Beijing, ChinaView further author information

Hong Lia Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Institute of Children’s Reading and Learning, Faculty of Psychology, Beijing Normal University, Haidian, Beijing, ChinaCorrespondence[email protected]

https://orcid.org/0000-0002-8569-9468 View further author information

Pages 235-255 | Published online: 14 Aug 2023

Cite this article
https://doi.org/10.1080/10888438.2023.2244620
CrossMark

Sample our Education journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/10888438.2023.2244620?needAccess=true

ABSTRACT

Purpose

This study sought to 1) identify linguistic features important for Chinese text complexity with a theory-based and systematic approach, and 2) address how feature sets and algorithms affect the performance of Chinese text complexity models.

Method

Texts from Chinese language arts textbooks from Grades 1 to 6 (N = 1,478) in Mainland China were analyzed. The predictor variables were 265 linguistic features of texts: 154 lexical features and 111 sentence and discourse features. The outcome variable was the complexity level of texts; a one-semester-scale was applied, thus 12 levels in total (two semesters per grade).

Results

Features of the categories of character and word frequency, character and word semantic features, lexical diversity, part-of-speech syntactic categories, and referential cohesion were found the most important. With the important features identified, we found that text complexity models with features at all levels outperformed those with features at only one level. Models using the two machine learning algorithms (Random Forest Regression and Support Vector Regression) outperformed those using Linear Regression.

Conclusion

This work clarifies important linguistic features for Chinese text complexity, and points to the necessity of considering features across levels and using machine learning algorithms in future text complexity research.

Acknowledgments

We thank Hailey Gibbs at the University of Maryland, College Park, for her kind help with proofreading.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. There are two scripts in the modern Chinese language, the Traditional Chinese script used in Hong Kong, Taiwan, and Macau, and the Simplified Chinese script mainly used in Mainland China. Although visually distinct, the two scripts carry the characteristics of the Chinese writing system in the same manner. Thus, we found it feasible to consider findings from both scripts in the context of text complexity research.

2. The regression models were used in our study under the consideration that the complexity levels of texts increase continuously throughout elementary school, without a clear boundary between two adjacent semester levels as claimed in Phani et al. (Citation2019).

3. We acknowledge that using absolute accuracy to evaluate regression models may not be appropriate (François & Miltsakaki, Citation2012), and we decided to include absolute accuracy here only to compare our results with previous Chinese text complexity studies, some of which merely reported absolute accuracy of their models (Sung et al., Citation2016; Tseng et al., Citation2019; Wu et al., Citation2020). We used a rounding method to convert continuous estimated values to categorical levels, e.g., an estimated value between 3.5 and 4.4 was considered a complexity level of 4 following previous practice (François & Miltsakaki, Citation2012).

4. We employed a 5-fold cross-validation, and thus there were five data points for each evaluation indices (e.g., R²) under each of the nine conditions (in the combination of three feature sets and three algorithms).

5. Both of our models would have achieved an absolute accuracy of .76 if we had used a two-grade-level scale like the existing models (.59–.64, Wu et al., Citation2020). Our models would have achieved the absolute accuracy of .49 (RFR) and .51 (SVR) if we have used a one-grade-level scale as existing models (.44–.72, Sung et al., Citation2016; .49–.76, Tseng et al., Citation2019).

Phani, S., Lahiri, S., & Biswas, A. (2019). Readability analysis of Bengali literary texts. Journal of Quantitative Linguistics, 26(4), 287–305. https://doi.org/10.1080/09296174.2018.1499456

Web of Science ®Google Scholar

François, T., & Miltsakaki, E. (2012). Do NLP and machine learning improve traditional readability formulas? Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, 49–57. https://doi.org/10.5555/2390916.2390925

Google Scholar

Sung, Y. T., Chen, J. L., Cha, J. H., Tseng, H. C., Chang, T. H., & Chang, K. E. (2015). Constructing and validating readability models: The method of integrating multilevel linguistic features with machine learning. Behavior Research Methods, 47(2), 340–354. https://doi.org/10.3758/s13428-014-0459-x

PubMed Web of Science ®Google Scholar

Tseng, H. C., Chen, B., Chang, T. H., & Sung, Y. T. (2019). Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts. Natural Language Engineering, 25(3), 331–361. https://doi.org/10.1017/S1351324919000093

Web of Science ®Google Scholar

Wu, S. Y., Yu, D., & Jiang, X. (2020). 汉语文本可读性特征体系构建和效度验证[development of linguistic features system for Chinese text readability assessment and its validity verification]. 世界汉语教学, 1, 81–97. https://doi.org/10.13724/j.cnki.ctiw.20200103.007

Google Scholar

PubMed Web of Science ®Google Scholar

Web of Science ®Google Scholar

Additional information

Funding

This research was supported by grants from the Ministry of Education of the People’s Republic of China [17YJA190009] to Hong Li. The writing of this paper was partially supported by a Seed Funding Grant at The Education University of Hong Kong [RG 37/2021-2022 R] to Yixun Li.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 337.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Sign me up

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research