314
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Utilizing Machine Learning Techniques for Classifying Translated and Non-Translated Corporate Annual Reports

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2340393 | Received 19 Nov 2023, Accepted 29 Mar 2024, Published online: 10 Apr 2024

References

  • Ajina, A., M. Laouiti, and B. Msolli. 2016. Guiding through the fog: Does annual report readability reveal earnings management? Research in International Business and Finance 38:509–30. doi:10.1016/j.ribaf.2016.07.021.
  • Al-Jabr, A. F. 2006. Effect of syntactic complexity on translating from/into English/Arabic. Babel 52 (3):203–21. doi:10.1075/babel.52.3.01alj.
  • Baker, H. E., and D. D. Kare. 1992. Relationship between annual report readability and corporate financial performance. Management Research News 15 (2):1–4. doi:10.1108/eb028188.
  • Balodis, K., and D. Deksne. 2019. Fasttext-based intent detection for inflected languages. Information 10 (5):161. doi:10.3390/info10050161.
  • Baroni, M., and S. Bernardini. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21 (3):259–74.
  • Beattie, V., B. McInnes, and S. Fearnley. 2004. A methodology for analysing and evaluating narratives in annual reports: A comprehensive descriptive profile and metrics for disclosure quality attributes. Accounting Forum 28 (3):205–36. doi:10.1016/j.accfor.2004.07.001.
  • Benjamin, R. G. 2012. Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review 24 (1):63–88. doi:10.1007/s10648-011-9181-8.
  • Bhatia, V. K. 2008. Genre analysis, ESP and professional practice. English for Specific Purposes 27 (2):161–74. doi:10.1016/j.esp.2007.07.005.
  • Bhatia, V. K. 2010. Interdiscursivity in professional communication. Discourse & Communication 4 (1):32–50. doi:10.1177/1750481309351208.
  • Bulté, B., and A. Housen. 2014. Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing 26:42–65. doi:10.1016/j.jslw.2014.09.005.
  • Chen, J., D. Li, and K. Liu. 2024. Unraveling cognitive constraints in constrained languages: A comparative study of syntactic complexity in translated, EFL, and native varieties. Language Sciences 102 (5):101612. doi:10.1016/j.langsci.2024.101612.
  • Clarke, D. P., S. L. Hrasky, and C. G. Tan. 2009. Voluntary narrative disclosures by local governments: A comparative analysis of the textual complexity of mayoral and chairpersons’ letters in annual reports. Australian Journal of Public Administration 68 (2):194–207. doi:10.1111/j.1467-8500.2009.00630.x.
  • Courtis, J. K. 1995. Readability of annual reports: Western versus Asian evidence. Accounting Auditing & Accountability Journal 8 (2):4–17. doi:10.1108/09513579510086795.
  • Courtis, J. K., and S. Hassan. 2002. Reading ease of bilingual annual reports. Journal of Business Communication 39 (4):394–413. doi:10.1177/002194360203900401.
  • Crossley, S., S. Skalicky, C. Berger, and A. Heidari 2022. Assessing readability formulas in the wild. In Polyphonic Construction of Smart Learning Ecosystems: Proceedings of the 7th Conference on Smart Learning Ecosystems and Regional Development, ed. M. Dascalu, P. Marti, and F. Pozzi, 91–101. Singapore: Springer Nature Singapore.
  • Dogra, V., A. Singh, S. Verma, A. Alharbi, and W. Alosaimi. 2021. Event study: Advanced machine learning and statistical technique for analyzing sustainability in banking stocks. Mathematics 9 (24):3319. doi:10.3390/math9243319.
  • Domingos, P., and M. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 (2/3):103–30. doi:10.1023/A:1007413511361.
  • Frantz, R. S., L. E. Starr, and A. L. Bailey. 2015. Syntactic complexity as an aspect of text complexity. Educational Researcher 44 (7):387–93. doi:10.3102/0013189X15603980.
  • Garzone, G. 2004. Annual company reports and CEO’s letters: Discoursal features and cultural markedness. In Intercultural aspects of specialized communication, ed. C. Candlin and M. Gotti, 311–43. US: Peter Lang.
  • Habib, A., and M. M. Hasan. 2020. Business strategies and annual report readability. Accounting & Finance 60 (3):2513–47. doi:10.1111/acfi.12380.
  • Habic, V., A. Semenov, and E. L. Pasiliao. 2020. Multitask deep learning for native language identification. Knowledge-Based Systems 209:106440. doi:10.1016/j.knosys.2020.106440.
  • Halverson, S. L. 2003. The cognitive basis of translation universals. Target International Journal of Translation Studies 15 (2):197–241. doi:10.1075/target.15.2.02hal.
  • Halverson, S. L. 2017. Gravitational pull in translation: Testing a revised model. In Empirical translation studies: New methodological and theoretical traditions, ed. G. D. Sutter, M. Lefer, and I. Delaere, 9–46. US: Mouton de Gruyter.
  • Hand, D. J. 2012. Assessing the performance of classification methods. International Statistical Review 80 (3):400–14. doi:10.1111/j.1751-5823.2012.00183.x.
  • House, J. 2014. Translation quality assessment: Past and present. US: Routledge.
  • Housen, A., and F. Kuiken. 2009. Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics 30 (5):461–73. doi:10.1093/applin/amp048.
  • Huang, D., and Y. Wang. 2020. A CGA-Based study on translation characteristics of chairman speeches in company annual reports. Journal of Literature and Art Studies 10 (5):126–35. doi:10.17265/2159-5836/2020.02.005.
  • Huang, A. H., H. Wang, and Y. Yang. 2023. FinBERT: A large language model for extracting information from financial text. Contemporary Accounting Research 40 (2):806–41. doi:10.1111/1911-3846.12832.
  • Hu, H., and S. Kübler. 2021. Investigating translated Chinese and its variants using machine learning. Natural Language Engineering 27 (5):339–72. doi:10.1017/S1351324920000182.
  • Ilisei, I., D. Inkpen, G. C. Pastor, and R. Mitkov. 2010. Identification of translationese: A machine learning approach. In Computational linguistics and intelligent text processing. CICLing 2010, Vol. 6008 of lecture notes in computer science, ed. A. Gelbukh, 503–11. US: Springer.
  • Jagaiah, T., N. G. Olinghouse, and D. M. Kearns. 2020. Syntactic complexity measures: Variation by genre, grade level, students’ writing abilities, and writing quality. Reading and Writing 33 (5):2577–638. doi:10.1007/s11145-020-10057-x.
  • Jeanjean, T., C. Lesage, and H. Stolowy. 2010. Why do you speak English (in your annual report)? The International Journal of Accountin 45 (2):200–23. doi:10.1016/j.intacc.2010.04.003.
  • Jeanjean, T., H. Stolowy, and M. Erkens. 2010. Really “Lost in translation”? The economic consequences of issuing an annual report in English. Crises et nouvelles problématiques de la Valeur 1–36.
  • Jin, T., X. Lu, and J. Ni. 2020. Syntactic complexity in adapted teaching materials: Differences among grade levels and implications for benchmarking. The Modern Language Journal 104 (1):192–208. doi:10.1111/modl.12622.
  • Jones, M. J. 1994. A comment to contextualize “Performance and readability: A comparison of annual reports of profitable and unprofitable corporations. The Journal of Business Communication 31 (3):225–30. doi:10.1177/002194369403100305.
  • Kuncheva, L. I., and C. J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2):181–207. doi:10.1023/A:1022859003006.
  • Lau, R. Y., S. Y. Liao, R. C. W. Kwok, K. Xu, Y. Xia, and Y. Li. 2012. Text mining and probabilistic language modeling for online review spam detection. ACM Transactions on Management Information Systems 2 (4):1–30. doi:10.1145/2070710.2070716.
  • Lehavy, R., F. Li, and K. Merkley. 2011. The effect of annual report readability on analyst following and the properties of their earnings forecasts. The Accounting Review 86 (3):1087–115. doi:10.2308/accr.00000043.
  • Lei, L., and Y. Shi. 2023. Syntactic complexity in adapted extracurricular reading materials. System 113:103002. doi:10.1016/j.system.2023.103002.
  • Li, F. 2008. Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics 45 (2–3):221–47. doi:10.1016/j.jacceco.2008.02.003.
  • Liao, S. 2021. A multidimensional analysis of letters to the shareholders by Chinese and American advanced equipment manufacturing companies. International Journal of Language and Linguistics 9 (4):161–68. doi:10.11648/j.ijll.20210904.14.
  • Li, J., W. Luo, and X. Deng. 2019. The effect of chairman’s statement tone changes in annual reports from Hong Kong. Journal of Physics Conference Series 1168 (3):032024. doi:10.1088/1742-6596/1168/3/032024.
  • Lim, E. K., K. Chalmers, and D. Hanlon. 2018. The influence of business strategy on annual report readability. Journal of Accounting and Public Policy 37 (5):65–81. doi:10.1016/j.jaccpubpol.2018.01.003.
  • Lin, Y., and J. Liang. 2023. Informativeness across interpreting types: Implications for language shifts under cognitive load. Entropy 25 (2):243.
  • Liu, K., and M. Afzaal. 2021. Syntactic complexity in translated and non-translated texts: A corpus-based study of simplification. Public Library of Science ONE 16 (6):e0253454. doi:10.1371/journal.pone.0253454.
  • Liu, K., Z. Liu, and L. Lei. 2022. Simplification in translated Chinese: An entropy-based approach. Lingua 286 (5):103364. doi:10.1016/j.lingua.2022.103364.
  • Liu, K., R. Ye, Z. Liu, and R. Ye. 2022. Entropy-based discrimination between translated Chinese and original Chinese using data mining techniques. Public Library of Science ONE 17 (3):e0265633. doi:10.1371/journal.pone.0265633.
  • Loughran, T., and B. McDonald. 2014. Measuring readability in financial disclosures. The Journal of Finance 69 (4):1643–71. doi:10.1111/jofi.12162.
  • Lu, X. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15 (4):474–96. doi:10.1075/ijcl.15.4.02lu.
  • Lu, X. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly 45 (1):36–62. doi:10.5054/tq.2011.240859.
  • Lu, X., and H. Ai. 2015. Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing 29:16–27. doi:10.1016/j.jslw.2015.06.003.
  • Lu, X., D. A. Gamson, and S. A. Eckert. 2014. Lexical difficulty and diversity in American elementary school reading textbooks: Changes over the past century. International Journal of Corpus Linguistics 19 (1):94–117. doi:10.1075/ijcl.19.1.04lu.
  • Manning, C. D., P. Raghavan, and H. Schütze. 2008. Introduction to information retrieval. Cambridge, UK: Cambridge University Press.
  • Mouthami, K., K. N. Devi, and V. M. Bhaskaran. 2013. Sentiment analysis and classification based on textual reviews. In 2013 International Conference on Information communication and embedded systems (ICICES), 271–76. Chennai.
  • Nguyen, B. Q. V., and Y. T. Kim. 2021. Landslide spatial probability prediction: A comparative assessment of naïve Bayes, ensemble learning, and deep learning approaches. Bulletin of Engineering Geology and the Environment 80 (5):4291–321. doi:10.1007/s10064-021-02194-6.
  • Nisioi, S., and L. P. Dinu. 2013. A clustering approach for translationese identification. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, 532–38.
  • Opitz, D., and R. Maclin. 1999. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11:169–98. doi:10.1613/jair.614.
  • Ortega, L. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24 (5):492–518. doi:10.1093/applin/24.4.492.
  • Ortega, L. 2015. Syntactic complexity in L2 writing: Progress and expansion. Journal of Second Language Writing 29 (5):82–94. doi:10.1016/j.jslw.2015.06.008.
  • Pallotti, G. 2015. A simple view of linguistic complexity. Second Language Research 31 (5):117–34. doi:10.1177/0267658314536435.
  • Rabinovich, E., and S. Wintner. 2015. Unsupervised identification of translationese. Transactions of the Association for Computational Linguistics 3 (5):419–32. doi:10.1162/tacl_a_00148.
  • Ren, C., and X. Lu. 2021. A multi-dimensional analysis of the management’s discussion and analysis narratives in Chinese and American corporate annual reports. English for Specific Purposes 62 (5):84–99. doi:10.1016/j.esp.2020.12.004.
  • Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review 33 (1–2):1–39. doi:10.1007/s10462-009-9124-7.
  • Rubino, R., E. Lapshinova-Koltunski, and J. Van Genabith. 2016. Information density and quality estimation features as translationese indicators for human translation classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies, San Diego, California, USA, 960–70.
  • Rutherford, B. A. 2003. Obfuscation, textual complexity and the role of regulated narrative accounting disclosure in corporate governance. Journal of Management & Governance 7 (5):187–210. doi:10.1023/A:1023647615279.
  • Sagi, O., and L. Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (4):e1249. doi:10.1002/widm.1249.
  • Schroeder, N., and C. Gibson. 1990. Readability of management’s discussion and analysis. Accounting Horizons 4 (December):78–87.
  • SEC (The U.S. Securities and Exchange Commission). A plain English handbook: How to create clear SEC disclosure documents. 1998. Accessed July 16, 2023. http://www.sec.gov/pdf/handbook.pdf
  • Smith, M., A. Jamil, Y. Chik Johari, and S. Ahmar Ahmad. 2006. The chairman’s statement in Malaysian companies: A test of the obfuscation hypothesis. Asian Review of Accounting 14 (1–2):49–65. doi:10.1108/13217340610729464.
  • Smith, M., and R. Taffler. 1992. The chairman’s statement and corporate financial performance. Accounting & Finance 32 (4):75–90. doi:10.1111/j.1467-629X.1992.tb00187.x.
  • Subramanian, R., R. G. Insley, and R. D. Blackwell. 1993. Performance and readability: A comparison of annual reports of profitable and unprofitable corporations. Journal of Business Communication 30 (1):49–61. doi:10.1177/002194369303000103.
  • Volansky, V., N. Ordan, and S. Wintner. 2015. On the features of translationese. Digital Scholarship in the Humanities 30 (1):98–118. doi:10.1093/llc/fqt031.
  • Wang, P. 2014. A comparison between Chinese version and English version of company annual reports. The Journal of Chinese Sociolinguistics 2:33–44.
  • Wang, Z., K. Liu, and R. Moratto. 2023. A corpus-based study of syntactic complexity of translated and non-translated chairman’s statements. Translation & Interpreting 15 (1):135–51. doi:10.12807/ti.115201.2023.a07.
  • Wu, J., K. Liu, R. Hu, and W. Zhou. 2023. A comparative study of the syntactic complexity of translated Chinese and original Chinese. Foreign Language Teaching and Research 55 (2):264–75.
  • Wu, X., A. Mauranen, and L. Lei. 2020. Syntactic complexity in English as a lingua franca academic writing. Journal of English for Academic Purposes 43:100798. doi:10.1016/j.jeap.2019.100798.
  • Xu, J., and J. Li. 2021. A syntactic complexity analysis of translational English across genres. Across Languages and Cultures 22 (2):214–32. doi:10.1556/084.2021.00015.
  • Xu, H., and K. Liu. 2023. Syntactic simplification in interpreted English: Dependency distance and direction measures. Lingua 294 (5):103607. doi:10.1016/j.lingua.2023.103607.
  • Zhou, S., K. Li, and Y. Liu. 2009. Text categorization based on topic model. International Journal of Computational Intelligence Systems 2 (4):398–409. doi:10.1080/18756891.2009.9727671.