ABSTRACT
This study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection.
Acknowledgments
We would like to thank the two anonymous reviewers for their valuable suggestions and comments.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Credit authorship contribution statement
Maryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.
Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.
Notes
1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).
2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.
3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).
Additional information
Funding
Notes on contributors
Maryam Nasseri
Maryam Nasseri received her doctoral degree from the University of Birmingham (UK), where she worked on the application of statistical modelling, NLP, and corpus linguistics methods on lexical and syntactic complexity. She has received multiple awards and grants, including the ISLE 2020 grant for syntactic complexification in academic texts and the AUS 2023-26 research grant for statistical modelling and designing software for lexical proficiency grading of academic writing. She has published in journals such as System, Journal of English for Academic Purposes (JEAP), and Assessing Writing and reviewed multiple articles and books for Taylor & Francis, Assessing Writing, and Journal of Language and Education (JLE).
Philip McCarthy
Philip McCarthy is an Associate Professor and discourse scientist specializing in software design and corpus analysis. His major interest is analyzing the English writings of students. His articles have been published in journals such as Discourse Processes, The Modern Language Journal, Written Communication, and Applied Psycholinguistics. McCarthy has been a teacher for 30 years, working in locations such as Turkiye, Japan, Britain, the US and the UAE. He is currently the principal investigator of a project on lexical proficiency grading of academic writing funded by the American University of Sharjah (AUS).