259
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Lexical Features and Psychological States: A Quantitative Linguistic Approach

ORCID Icon

References

  • Al-Mosaiwi, M., & Johnstone, T. (2018). In an absolute state: Elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clinical Psychological Science, 6(4), 529–542. https://doi.org/10.1177/2167702617747074
  • Alvarez-Conrad, J., Zoellner, L. A., & Foa, E. B. (2001). Linguistic predictors of trauma pathology and physical health. Applied Cognitive Psychology, 15(7), S159–S170. https://doi.org/10.1002/acp.839
  • Barnes, D. H., Lawal-Solarin, F. W., & Lester, D. (2007). Letters from a suicide. Death Studies, 31(7), 671–678. https://doi.org/10.1080/07481180701405212
  • Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
  • Boals, A., & Klein, K. (2005). Word use in emotional narratives about failed romantic relationships and subsequent mental health. Journal of Language and Social Psychology, 24(3), 252–268. https://doi.org/10.1177/0261927X05278386
  • Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J.-S.-S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127–135. https://doi.org/10.1016/j.tree.2008.10.008
  • Boukil, S., El Adnani, F., Cherrat, L., El Moutaouakkil, A. E., & Ezziyyani, M. (2019). Deep learning algorithm for suicide sentiment prediction. In M. Ezziyyani (Ed.), Advanced intelligent systems for sustainable development (AI2SD’2018) (Vol. 914, pp. 261–272). Springer International Publishing. https://doi.org/10.1007/978-3-030-11884-6_24
  • Boyd, R. L., Ashokkumar, A., & Seraj, S. (2022). The development and psychometric properties of LIWC-22. University of Texas at Austin. https://www.liwc.app
  • Boyd, R. L., & Schwartz, H. A. (2021). Natural language analysis and the psychology of verbal behavior: The past, present, and future states of the field. Journal of Language and Social Psychology, 40(1), 21–41. https://doi.org/10.1177/0261927X20967028
  • Calvo, R. A., Milne, D. N., Hussain, M. S., & Christensen, H. (2017). Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering, 23(5), 649–685. https://doi.org/10.1017/S1351324916000383
  • Calzà, L., Gagliardi, G., Rossini Favretti, R., & Tamburini, F. (2021). Linguistic features and automatic classifiers for identifying mild cognitive impairment and dementia. Computer Speech & Language, 65, 101113. https://doi.org/10.1016/j.csl.2020.101113
  • Charest, M., Skoczylas, M. J., & Schneider, P. (2020). Properties of lexical diversity in the narratives of children with typical language development and developmental language disorder. American Journal of Speech-Language Pathology, 29(4), 1866–1882. https://doi.org/10.1044/2020_AJSLP-19-00176
  • Chen, R., Liu, H., & Altmann, G. (2016). Entropy in different text types. Digital Scholarship in the Humanities, fqw008. https://doi.org/10.1093/llc/fqw008
  • Chen, H., & Xu, H. (2019). Quantitative linguistics approach to interlanguage development: A study based on the Guangwai-Lancaster Chinese Learner Corpus. Lingua, 230, 102736. https://doi.org/10.1016/j.lingua.2019.102736
  • Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. https://doi.org/10.1080/09296171003643098
  • Cunningham, K. T., & Haley, K. L. (2020). Measuring lexical diversity for discourse analysis in aphasia: Moving-Average Type–Token Ratio and word information measure. Journal of Speech, Language & Hearing Research, 63(3), 710–721. https://doi.org/10.1044/2019_JSLHR-19-00226
  • De Choudhury, M., Counts, S., & Horvitz, E. (2013). Predicting postpartum changes in emotion and behavior via social media. In Conference on Human Factors in Computing Systems- Proceedings (pp. 3267–3276). https://doi.org/10.1145/2470654.2466447
  • Demiray, Ç. K., & Gençöz, T. (2018). Linguistic reflections on psychotherapy: Change in usage of the first person pronoun in information structure positions. Journal of Psycholinguistic Research, 47(4), 959–973. https://doi.org/10.1007/s10936-018-9569-4
  • Desmet, B., & Hoste, V. (2013). Emotion detection in suicide notes. Expert Systems with Applications, 40(16), 6351–6358. https://doi.org/10.1016/j.eswa.2013.05.050
  • Eichstaedt, J. C., Smith, R. J., Merchant, R. M., Ungar, L. H., Crutchley, P., Preoţiuc-Pietro, D., Asch, D. A., & Schwartz, H. A. (2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44), 11203–11208. https://doi.org/10.1073/pnas.1802331115
  • Elarnaoty, M., & Farghaly, A. (2018). Machine learning implementations in arabic text classification. In K. Shaalan, A. E. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (Vol. 740, pp. 295–324). Springer International Publishing. https://doi.org/10.1007/978-3-319-67056-0_15
  • Gilgur, A., & Ramirez-Marquez, J. E. (2020). Using deductive reasoning to identify unhappy communities. Social Indicators Research, 152(2), 581–605. https://doi.org/10.1007/s11205-020-02452-2
  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Tenth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Jose, CA (pp. 168–177).
  • Jiang, X., Jiang, Y., & Hoi, C. K. W. (2020). Is Queen’s English drifting towards common people’s English? —quantifying diachronic changes of queen’s Christmas messages (1952–2018) with reference to BNC. Journal of Quantitative Linguistics, 29(1), 1–36. https://doi.org/10.1080/09296174.2020.1737483
  • Ji, S., Yu, C. P., Fung, S., Pan, S., & Long, G. (2018). Supervised learning for suicidal ideation detection in online user content. Complexity, 2018, 1–10. https://doi.org/10.1155/2018/6157249
  • Jones, L. S., Anderson, E., Loades, M., Barnes, R., & Crawley, E. (2020). Can linguistic analysis be used to identify whether adolescents with a chronic illness are depressed? Clinical Psychology & Psychotherapy, 27(2), 179–192. https://doi.org/10.1002/cpp.2417
  • Juola, P., Mikros, G. K., & Vinsick, S. (2019). Correlations and potential cross-linguistic indicators of writing style. Journal of Quantitative Linguistics, 26(2), 146–171. https://doi.org/10.1080/09296174.2018.1458395
  • Kahn, J. H., Tobin, R. M., Massey, A. E., & Anderson, J. A. (2007). Measuring emotional expression with the linguistic inquiry and word Count. The American Journal of Psychology, 120(2), 263. https://doi.org/10.2307/20445398
  • Kim, K., Choi, S., Lee, J., & Sea, J. (2019). Differences in linguistic and psychological characteristics between suicide notes and diaries. The Journal of General Psychology, 146(4), 391–416. https://doi.org/10.1080/00221309.2019.1590304
  • Kotu, V., & Deshpande, B. (2015). Predictive analytics and data mining: Concepts and practice with RapidMiner. Elsevier/Morgan Kaufmann, Morgan Kaufmann is an imprint of Elsevier. https://doi.org/10.1016/B978-0-12-801460-8.00013-6
  • Kubát, M., Matlach, V., & Čech, R. (2014). QUITA –Quantitative Index Text Analyzer. RAM-Verlag.
  • Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics, 20(4), 339–349. https://doi.org/10.1080/09296174.2013.830552
  • Kučera, D., Haviger, J., & Havigerová, J. M. (2020). Personality and text: Quantitative psycholinguistic analysis of a stylistically differentiated Czech text. Psychological Studies, 65(3), 336–348. https://doi.org/10.1007/s12646-020-00553-z
  • Lei, L., & Shi, Y. (2023). Syntactic complexity in adapted extracurricular reading materials. System, 113, 103002. https://doi.org/10.1016/j.system.2023.103002
  • Litvinova, T., Zagorovskaya, O., Litvinova, O., & Seredin, P. (2016). Profiling a set of personality traits of a text’s author: A corpus-based approach. In A. Ronzhin, R. Potapova, & G. Nemeth (Eds.), Speech and Computer: Proceedings of the 18th International Conference, SPECOM 2016 (pp. 555–562). Budapest, Hungary. Springer International Publishing.
  • Liu, H. (2017). An introduction to quantitative linguistics. The Commercial Press.
  • Liu, D., & Lei, L. (2018). The appeal to political sentiment: An analysis of Donald Trump’s and Hillary Clinton’s speech themes and discourse strategies in the 2016 US presidential election. Discourse, Context & Media, 25, 143–152. https://doi.org/10.1016/j.dcm.2018.05.001
  • Lyons, M., Aksayli, N. D., & Brewer, G. (2018). Mental distress and language use: Linguistic analysis of discussion forum posts. Computers in Human Behavior, 87, 207–211. https://doi.org/10.1016/j.chb.2018.05.035
  • Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90(5), 862–877. https://doi.org/10.1037/0022-3514.90.5.862
  • Melka, T. S., & Místecký, M. (2020). On stylometric features of H. Beam Piper’s omnilingual. Journal of Quantitative Linguistics, 27(3), 204–243. https://doi.org/10.1080/09296174.2018.1560698
  • Minor, K. S., Bonfils, K. A., Luther, L., Firmin, R. L., Kukla, M., MacLain, V. R., Buck, B., Lysaker, P. H., & Salyers, M. P. (2015). Lexical analysis in schizophrenia: How emotion and social word use informs our understanding of clinical presentation. Journal of Psychiatric Research, 64, 74–78. https://doi.org/10.1016/j.jpsychires.2015.02.024
  • Moustafa, N., & Slay, J. (2016). The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1–3), 18–31. https://doi.org/10.1080/19393555.2015.1125974
  • Nadeem, M., Horn, M., & Coppersmith, G. (2016). Identifying depression on Twitter. arXiv, Preprint. https://doi.org/10.48550/arXiv.1607.07384
  • Nguyen, T., O’Dea, B., Larsen, M., Phung, D., Venkatesh, S., & Christensen, H. (2017). Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools and Applications, 76(8), 10653–10676. https://doi.org/10.1007/s11042-015-3128-x
  • Nykodym, T., Kraljevic, T., Hussami, N., Rao, A., & Wang, A. (2020). Generalized Linear Modeling with H2O. http://h2o.ai/resources/
  • Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., & Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934–952. https://doi.org/10.1037/pspp0000020
  • Paul, S., Bhattacharya, P., & Bit, A., Paul, S., Bhattacharya, P., Bit, A. (2019). Early detection of neurological disorders using machine learning systems. IGI Global. https://doi.org/10.4018/978-1-5225-8567-1
  • Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology, 85(2), 291–301. https://doi.org/10.1037/0022-3514.85.2.291
  • Pestian, J. P., Grupp-Phelan, J., Bretonnel Cohen, K., Meyers, G., Richey, L. A., Matykiewicz, P., & Sorter, M. T. (2016). A controlled trial using natural language processing to examine the language of suicidal adolescents in the emergency department. Suicide and Life-Threatening Behavior, 46(2), 154–159. https://doi.org/10.1111/sltb.12180
  • Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide note classification using natural language processing: A content analysis. Biomedical Informatics Insights, 3, 19–28. https://doi.org/10.4137/BII.S4706
  • Popescu, I. I. (2007). Text ranking by the weight of highly frequent words. In P. Grzybek (Ed.), Exact methods in the study of language and text (pp. 555–566). Mouton de Gruyter.
  • Popescu, I.-I., & Altmann, G. (2007). Writer’s view of text generation. Glottometrics, 15, 42–52. https://www.ram-verlag.eu/wp-content/uploads/2018/08/g15zeit.pdf
  • Popescu, I.-I., Altmann, G., & Cech, R. (2011). The Lambda-structure of Texts. RAM-Verlag.
  • Popescu, I.-I., Altmann, G., & Köhler, R. (2010). Zipf’s law—another view. Quality & Quantity, 44(4), 713–731. https://doi.org/10.1007/s11135-009-9234-y
  • Popescu, I.-I., Macutek, J., & Altmann, G. (2009). Aspects of word frequencies. RAM-Verlag. https://doi.org/10.1515/9783110218534
  • RapidMiner. (2019). https://docs.rapidminer.com/9.0/studio/operators/scoring/explain_predictions.html
  • Rosenbach, C., & Renneberg, B. (2015). Remembering rejection: Specificity and linguistic styles of autobiographical memories in borderline personality disorder and depression. Journal of Behavior Therapy and Experimental Psychiatry, 46, 85–92. https://doi.org/10.1016/j.jbtep.2014.09.002
  • Rude, S., Gortner, E.-M., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8), 1121–1133. https://doi.org/10.1080/02699930441000030
  • Savoy, J. (2020). Machine learning methods for stylometry: Authorship attribution and author profiling. Springer International Publishing. https://doi.org/10.1007/978-3-030-53360-1
  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., Ungar, L. H., & Preis, T. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9), e73791. https://doi.org/10.1371/journal.pone.0073791
  • Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards assessing changes in degree of depression through Facebook. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (pp. 118–125). https://doi.org/10.3115/v1/W14-3214
  • Shi, Y., & Lei, L. (2020). Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics, 1–18. https://doi.org/10.1080/09296174.2020.1766346
  • Smirnova, D., Cumming, P., Sloeva, E., Kuvshinova, N., Romanov, D., & Nosachev, G. (2018). Language patterns discriminate mild depression from normal sadness and euthymic state. Frontiers in Psychiatry, 9, 105. https://doi.org/10.3389/fpsyt.2018.00105
  • Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine, 63(4), 517–522. https://doi.org/10.1097/00006842-200107000-00001
  • Stone, L. D., & Pennebaker, J. W. (2004). What was she trying to say? A linguistic analysis of Katie’s diary. the secret diary of Katie: Unlocking the mystery of a suicide. Brunner-Routledge.
  • Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2019). Detection of depression-related posts in Reddit social media forum. Institute of Electrical and Electronics Engineers Access, 7, 44883–44893. https://doi.org/10.1109/ACCESS.2019.2909180
  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676
  • Thapa, R., & Subedi, S. (2018). Social media and depression. Journal of Psychiatrists’ Association of Nepal, 7(2), 1–4. https://doi.org/10.3126/jpan.v7i2.24607
  • Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15 (pp. 3187–3196). https://doi.org/10.1145/2702123.2702280
  • Wadsworth, F. B., Vasseur, J., & Damby, D. E. (2016). Evolution of vocabulary in the poetry of Sylvia plath. Digital Scholarship in the Humanities, fqw026. https://doi.org/10.1093/llc/fqw026
  • Wang, X., Chen, S., Li, T., Li, W., Zhou, Y., Zheng, J., Chen, Q., Yan, J., & Tang, B. (2020). Depression risk prediction for Chinese microblogs via deep-learning methods: Content analysis. JMIR Medical Informatics, 8(7), e17958. https://doi.org/10.2196/17958
  • Wang, X., Chen, S., Li, T., Li, W., Zhou, Y., Zheng, J., Zhang, Y., & Tang, B. (2019). Assessing depression risk in Chinese microblogs: A corpus and machine learning methods. In 2019 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 1–5). https://doi.org/10.1109/ICHI.2019.8904506
  • Wilson, A. (2009). Vocabulary richness and thematic concentration in internet fetish fantasies and literary short stories. Glottotheory, 2(2). https://doi.org/10.1515/glot-2009-0023
  • Xiao, W., & Sun, S. (2020). Dynamic lexical features of PhD theses across disciplines: A text mining approach. Journal of Quantitative Linguistics, 27(2), 114–133. https://doi.org/10.1080/09296174.2018.1531618
  • Zaharie, D., Lungeanu, D., & Holban, S. (2007). Feature ranking based on weights estimated by multi-objective optimization. In Proceedings of IADIS First European Conference on Data Mining (pp. 124–128). International Association for Development of the Information Society.
  • Zhang, Y. (2014). A corpus based analysis of lexical richness of Beijing mandarin speakers: Variable identification and model construction. Language Sciences, 44, 60–69. https://doi.org/10.1016/j.langsci.2013.12.003
  • Zhao, Y., Da, J., & Yan, J. (2021). Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches. Information Processing & Management, 58(1), 102390. https://doi.org/10.1016/j.ipm.2020.102390
  • Zörnig, P., & Altmann, G. (2016). A sequential activity measure for texts and speeches. Glottotheory, 7(2). https://doi.org/10.1515/glot-2016-0015
  • Zörnig, P., Stachowski, K., Popescu, I. I., Mosavi, M., Mohanty, P., Kelih, E., Chen, R., & Altmann, G. (2015). Descriptiveness, activity and nominality in formalized text sequences. Lüdenscheid: RAM–Verlag.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.