ABSTRACT
The morphological synergetic model has yet to be fully tested in typical analytic languages. The quantification of Chinese morphology and its relationship with word frequency can help construct and test the morphological synergetic model in Chinese. Based on the Lancaster Corpus of Mandarin Chinese, this study proposes a quantitative method for the structural complexity of Chinese words by Kolmogorov complexity, further examining the interrelation between the structural complexity of words (SCW) and word frequency. Results show that the SCW of words formed by combining morphemes in multiple assembling ways is generally higher than that in a single assembling way among the seven structural types of Chinese words, but derivational affixes impact SCW significantly. The higher SCW, the lower the word frequency. Given the combined effects of morpheme properties, y=Ax-be-cx is more suitable to describe the inverse relationship than y=Ax-b. Additionally, the higher the word frequency, the lower SCW. The delayed negative feedback causes small-scale fluctuations, but y=Ax-be-cx can effectively describe the overall interactions between the two. From the internal mechanism, word frequency changes first, thus causing changes in word structure; In turn, for communication effectiveness, the structure of words becomes more complex to carry more meaning, thus influencing word frequency.
Acknowledgments
We would like to thank the anonymous reviewers for their constructive criticism and suggestions, which led to improvements of the paper.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. Chao (Citation1968) included decompounds in complex compounds, but this study focuses on word structure from the way morphemes are combined, so decompounds are classified into a separate category.
2. Question 1 has no corresponding hypothesis (so there is no Hypothesis 1). Hypothesis 2 here corresponds to Question 2, and Hypothesis 3 below corresponds to Question 3.
3. The software bzip2 was used to compress sequences in the present study.
4. Step 4 will be repeated several times (i = 1, 2, …, 100) in Step 6. S21 indicates the sequence obtained at Step 4 for the first time.
5. ·indicates the morpheme boundary within the word, and/indicates the word boundary.