197
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Word Length in Chinese: The Menzerath-Altmann Law is Valid After All

ORCID Icon, ORCID Icon & ORCID Icon
Pages 304-321 | Published online: 06 Nov 2023
 

ABSTRACT

According to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).

Acknowledgments

The work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).

2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).

3. Syllable length was measured in moras, not in phonemes.

4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).

5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).

6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.

7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.

8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segments between 1 and 7 syllables (Ščigulinská & Schusterová, Citation2014, pp. 70–72, p. 77).

9. Kovaľová and Schusterová (Citation2016, pp. 122–133) reported lengths of stress units between 1 and 21 syllables, similarly to Rothe-Neves et al. (Citation2017, p. 6) who reported lengths of utterances between 2 and 29 syllables. On the other hand, Geršić and Altmann (Citation1980, pp. 115–123) tested the law on word lengths only up to 5 syllables.

10. https://www.fon.hum.uva.nl/praat/ (accessed 1 June 2023).

11. Recall that Stave et al. (Citation2021) study the relation between word length in morphemes and the mean morpheme length in graphemes.

12. https://www.wordproject.org/ (accessed 1 June 2023).

13. International Biblical Association. Wordproject®: Sheng Jing: Xīnyuē Quán Shū [Holy Bible. New Testament]. Available at https://www.wordproject.org/bibles/pn/index.htm (accessed 1 June 2023).

14. International Biblical Association. Wordproject®: 圣经. 新约全书 [Holy Bible. New Testament]. Available at https://www.wordproject.org/bibles/gb_cat/index.htm (accessed 1 June 2023).

15. Available at https://github.com/tsroten/pynlpir (accessed 1 June 2023).

16. Available at https://github.com/NLPIR-team/NLPIR (accessed 1 June 2023).

18. Available at https://github.com/mozillazg/python-pinyin (accessed 23 July 2023).

19. http://www.nlreg.com (accessed June 2023)

20. Naturally, this requirement is another rule of thumb. See e.g. Mačutek and Rovenchak (Citation2011) and Mačutek et al. (Citation2021) for similar, but slightly different approaches to the problem of word length categories with too low frequencies.

21. If, e.g. we measure word length in syllables, and lengths from 1 to 5 occur more than 10 times, length 6 has frequency 12, and length 7 has frequency 1, we pool the last two lengths into one category. The weighted mean word length in this category is 12×6+1×712+1=6.08; see data in .

22. We also obtained comparable results for the relation between word length and the mean syllables length for Pīnyīn Rìjì Duǎnwén, a diary written by Zhang Qiling (available at http://www.pinyin.info/readings/pinyin_riji_duanwen.html, accessed 1 June 2023), and for a sample containing Press reportage (text category A) and Science academic prose (text category J) from The Lancaster Corpus of Mandarin Chinese (McEnery et al., Citation2003). Similarly to and , there is a decreasing tendency of the mean syllable length, with a slight increase for the longest words.

23. We also obtained comparable results for the relation between word length in Chinese characters and the mean character size in components and strokes, respectively, for a short story 我为什么要结婚 [Why do I want to get married] from a short story collection 黄昏里的男孩 [The boy in the dusk]) written by Yu Hua (Citation2012), as well as for a sample containing Press reportage (text category A) and Science academic prose (text category J) from The Lancaster Corpus of Mandarin Chinese (McEnery et al., Citation2003).

24. Words consisting of one, two, and three syllables make 99.7% of all word tokens in the Chinese translation of the New Testament, see .

25. Given the wide scope of the least effort principle (see Zipf, Citation1949), easier-to-pronounce tones probably occur more frequently (see Zhang, Citation2002). Tone characteristics can also interact with other word properties, e.g. longer words can have a higher proportion of simpler tones than shorter ones.

26. According to Berdicevskis (Citation2021, p. 27), ‘clauses are not repeated in languages often enough to enable frequency estimates’.

Additional information

Funding

This work was supported by the Agentúra na Podporu Výskumu a Vývoja [APVV-21-0216]; European Regional Development Fund [CZ.02.1.01/0.0/0.0/16_019/0000791]; Operational Programme Integrated Infrastructure (OPII) [313011BWH2]; Vedecká Grantová Agentúra MŠVVaŠ SR a SAV [2/0096/21].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.