192
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Word Length in Chinese: The Menzerath-Altmann Law is Valid After All

ORCID Icon, ORCID Icon & ORCID Icon
 

ABSTRACT

According to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).

Acknowledgments

The work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).

2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).

3. Syllable length was measured in moras, not in phonemes.

4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).

5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).

6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.

7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.

8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segments between 1 and 7 syllables (Ščigulinská & Schusterová, Citation2014, pp. 70–72, p. 77).

9. Kovaľová and Schusterová (Citation2016, pp. 122–133) reported lengths of stress units between 1 and 21 syllables, similarly to Rothe-Neves et al. (Citation2017, p. 6) who reported lengths of utterances between 2 and 29 syllables. On the other hand, Geršić and Altmann (Citation1980, pp. 115–123) tested the law on word lengths only up to 5 syllables.

10. https://www.fon.hum.uva.nl/praat/ (accessed 1 June 2023).

11. Recall that Stave et al. (Citation2021) study the relation between word length in morphemes and the mean morpheme length in graphemes.

12. https://www.wordproject.org/ (accessed 1 June 2023).

13. International Biblical Association. Wordproject®: Sheng Jing: Xīnyuē Quán Shū [Holy Bible. New Testament]. Available at https://www.wordproject.org/bibles/pn/index.htm (accessed 1 June 2023).

14. International Biblical Association. Wordproject®: 圣经. 新约全书 [Holy Bible. New Testament]. Available at https://www.wordproject.org/bibles/gb_cat/index.htm (accessed 1 June 2023).

15. Available at https://github.com/tsroten/pynlpir (accessed 1 June 2023).

16. Available at https://github.com/NLPIR-team/NLPIR (accessed 1 June 2023).

18. Available at https://github.com/mozillazg/python-pinyin (accessed 23 July 2023).

19. http://www.nlreg.com (accessed June 2023)

20. Naturally, this requirement is another rule of thumb. See e.g. Mačutek and Rovenchak (Citation2011) and Mačutek et al. (Citation2021) for similar, but slightly different approaches to the problem of word length categories with too low frequencies.

21. If, e.g. we measure word length in syllables, and lengths from 1 to 5 occur more than 10 times, length 6 has frequency 12, and length 7 has frequency 1, we pool the last two lengths into one category. The weighted mean word length in this category is 12×6+1×712+1=6.08; see data in .

22. We also obtained comparable results for the relation between word length and the mean syllables length for Pīnyīn Rìjì Duǎnwén, a diary written by Zhang Qiling (available at http://www.pinyin.info/readings/pinyin_riji_duanwen.html, accessed 1 June 2023), and for a sample containing Press reportage (text category A) and Science academic prose (text category J) from The Lancaster Corpus of Mandarin Chinese (McEnery et al., Citation2003). Similarly to and , there is a decreasing tendency of the mean syllable length, with a slight increase for the longest words.

23. We also obtained comparable results for the relation between word length in Chinese characters and the mean character size in components and strokes, respectively, for a short story 我为什么要结婚 [Why do I want to get married] from a short story collection 黄昏里的男孩 [The boy in the dusk]) written by Yu Hua (Citation2012), as well as for a sample containing Press reportage (text category A) and Science academic prose (text category J) from The Lancaster Corpus of Mandarin Chinese (McEnery et al., Citation2003).

24. Words consisting of one, two, and three syllables make 99.7% of all word tokens in the Chinese translation of the New Testament, see .

25. Given the wide scope of the least effort principle (see Zipf, Citation1949), easier-to-pronounce tones probably occur more frequently (see Zhang, Citation2002). Tone characteristics can also interact with other word properties, e.g. longer words can have a higher proportion of simpler tones than shorter ones.

26. According to Berdicevskis (Citation2021, p. 27), ‘clauses are not repeated in languages often enough to enable frequency estimates’.

Additional information

Funding

This work was supported by the Agentúra na Podporu Výskumu a Vývoja [APVV-21-0216]; European Regional Development Fund [CZ.02.1.01/0.0/0.0/16_019/0000791]; Operational Programme Integrated Infrastructure (OPII) [313011BWH2]; Vedecká Grantová Agentúra MŠVVaŠ SR a SAV [2/0096/21].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.