438
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The Structural Complexity of Chinese Words and Its Relationship with Word Frequency

ORCID Icon, ORCID Icon & ORCID Icon
Pages 231-256 | Published online: 06 Jul 2023
 

ABSTRACT

The morphological synergetic model has yet to be fully tested in typical analytic languages. The quantification of Chinese morphology and its relationship with word frequency can help construct and test the morphological synergetic model in Chinese. Based on the Lancaster Corpus of Mandarin Chinese, this study proposes a quantitative method for the structural complexity of Chinese words by Kolmogorov complexity, further examining the interrelation between the structural complexity of words (SCW) and word frequency. Results show that the SCW of words formed by combining morphemes in multiple assembling ways is generally higher than that in a single assembling way among the seven structural types of Chinese words, but derivational affixes impact SCW significantly. The higher SCW, the lower the word frequency. Given the combined effects of morpheme properties, y=Ax-be-cx is more suitable to describe the inverse relationship than y=Ax-b. Additionally, the higher the word frequency, the lower SCW. The delayed negative feedback causes small-scale fluctuations, but y=Ax-be-cx can effectively describe the overall interactions between the two. From the internal mechanism, word frequency changes first, thus causing changes in word structure; In turn, for communication effectiveness, the structure of words becomes more complex to carry more meaning, thus influencing word frequency.

Acknowledgments

We would like to thank the anonymous reviewers for their constructive criticism and suggestions, which led to improvements of the paper.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Chao (Citation1968) included decompounds in complex compounds, but this study focuses on word structure from the way morphemes are combined, so decompounds are classified into a separate category.

2. Question 1 has no corresponding hypothesis (so there is no Hypothesis 1). Hypothesis 2 here corresponds to Question 2, and Hypothesis 3 below corresponds to Question 3.

3. The software bzip2 was used to compress sequences in the present study.

4. Step 4 will be repeated several times (i = 1, 2, …, 100) in Step 6. S21 indicates the sequence obtained at Step 4 for the first time.

5. ·indicates the morpheme boundary within the word, and/indicates the word boundary.

Additional information

Funding

This work is partly supported by the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities in China (22JJD740018), and the Fundamental Research Funds for the Central Universities and the Research Funds of Beijing Language and Culture University (22YCX194).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.