1,015
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Improved Word Segmentation System for Chinese Criminal Judgment Documents

Article: 2297524 | Received 04 Oct 2023, Accepted 15 Dec 2023, Published online: 21 Dec 2023

References

  • Barua, P. D., E. Aydemir, S. Dogan, M. Erten, F. Kaysi, T. Tuncer, H. Fujita, E. E. Palmer, and U. R. Acharya. 2022. Novel favipiravir pattern-based learning model for automated detection of specific language impairment disorder using vowels. Neural Computing and Applications 35 (8):6065–17. doi:10.1007/s00521-022-07999-4.
  • Dai, Z., X. Wang, P. Ni, Y. Li, G. Li, and X. Bai. 2019. Named entity recognition using BERT BiLSTM CRF for Chinese Electronic Health Records. 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Suzhou, China, 1–5.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, United States 1:4171–86.
  • Diao, S., J. Bai, Y. Song, T. Zhang, and Y. Wang. 2020. ZEN: Pre-training Chinese text encoder enhanced by N-Gram representations. Findings of the Association for Computational Linguistics: EMNLP 2020:4729–40.
  • Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (8):1735–80. doi:10.1162/neco.1997.9.8.1735.
  • Hu, X., H. Zhang, and S. Hu 2022. Chinese named entity recognition based on BERTbased-BiLSTM-CRF model. IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS), Zhuhai, China, 100–04.
  • Jiang, W., H. Mi, and Q. Liu. 2008. Word lattice reranking for Chinese word segmentation and part-of-speech tagging. Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, United Kingdom, 385–92.
  • Khairy, M., T. M. Mahmoud, and T. Abd El‐Hafeez. 2021. Automatic detection of cyberbullying and abusive language in Arabic Content on Social Networks: A survey. Procedia Computer Science 189:156–66. doi:10.1016/j.procs.2021.05.080.
  • Khairy, M., T. M. Mahmoud, A. Omar, and T. A. El‐Hafeez. 2023. Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection. Language Resources and Evaluation 2023. doi:10.1007/s10579-023-09683-y.
  • Kirik, S., S. Dogan, M. Baygin, P. D. Barua, C. F. Demir, T. Keles, A. M. Yildiz, N. Baygin, I. Tuncer, T. Tuncer, et al. 2023. FGPat18: Feynman graph pattern-based language detection model using EEG signals. Biomedical Signal Processing and Control 85:104927. doi:10.1016/j.bspc.2023.104927.
  • Kong, L., C. Dyer, and N. A. Smith 2016. Segmental Recurrent Neural Networks. International Conference on Learning Representations, San Juan, Puerto Rico.
  • Lafferty, J., A. McCallum, and F. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. International Conference on Machine Learning, Williamstown, United States, 282–89.
  • Levow, G. 2006. The third International Chinese Language Processing BakeOFF: Word segmentation and named entity recognition. Meeting of the Association for Computational Linguistics, Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 108–17.
  • Liu, J., F. Wu, C. Wu, Y. Huang, and X. Xie. 2019. Neural Chinese word segmentation with dictionary. Neurocomputing 338:46–54. doi:10.1016/j.neucom.2019.01.085.
  • Liu, Y., Y. Zhang, W. Che, T. Liu, and F. Wu. 2014. Domain adaptation for CRF-Based Chinese word segmentation using free annotations. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 864–74.
  • Li, X., C. Zong, and K.-Y. Su. 2015. A unified model for solving the OOV problem of Chinese word segmentation. ACM Transactions on Asian and Low-Resource Language Information Processing 14 (3):1–29. doi:10.1145/2699940.
  • Ma, J., K. Ganchev, and D. Weiss. 2018. State-of-the-art Chinese word segmentation with bi-LSTMs. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 4902–08.
  • Mor, B., S. Garhwal, and A. Loura. 2020. A systematic review of hidden Markov models and their applications. Archives of Computational Methods in Engineering 28 (3):1429–48. doi:10.1007/s11831-020-09422-4.
  • Omar, A., and T. A. El‐Hafeez. 2023. Quantum computing and machine learning for Arabic language sentiment classification in social media. Scientific Reports 13 (1):17305. doi:10.1038/s41598-023-44113-7.
  • Omar, A., T. M. Mahmoud, and T. A. El‐Hafeez. 2020. Comparative performance of machine learning and deep learning algorithms for Arabic hate speech detection in OSNs. Advances in Intelligent Systems & Computing 1153:247–57.
  • Omar, A., T. M. Mahmoud, T. A. El‐Hafeez, and A. Mahfouz. 2021. Multi-label Arabic text classification in online social networks. Information Systems 100:101785. doi:10.1016/j.is.2021.101785.
  • Schuster, M., and K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45 (11):2673–81. doi:10.1109/78.650093.
  • Tang, J., Q. Wu, and Y. Li. 2015. An optimization algorithm of Chinese word segmentation based on dictionary. International Conference on Network and Information Systems for Computers, Wuhan, China, 259–62.
  • Tuncer, T., S. Dogan, E. Akbal, A. Cicekli, and U. R. Acharya. 2022. Development of accurate automated language identification model using polymer pattern and tent maximum absolute pooling techniques. Neural Computing and Applications 34 (6):4875–88. doi:10.1007/s00521-021-06678-0.
  • Wang, S., Y. Meng, R. Ouyang, J. Li, T. Zhang, L. Lyu, and G. Wang. 2023. GNN-SL: Sequence labeling based on nearest examples via GNN. Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 12679–92.
  • Wieczorek, W. 2016. An algorithm based on a directed acyclic word graph. Studies in Computational Intelligence 673:77–81.
  • Xiong, H., G. Wu, S. Xue, H. Li, and T. Zhu. 2021. Dictionary-based classical Chinese word segmentation and its application on imperial edicts of Jin Dynasties. Human Centered Computing, Virtual Event, 153–60.
  • Xu, J., H. He, X. Sun, X. Ren, and S. Li. 2018. Cross-domain and semisupervised named entity recognition in Chinese social Media: A unified model. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (11):2142–52. doi:10.1109/TASLP.2018.2856625.
  • Yang, R., Y. Gan, and C. Zhang. 2022. Chinese named entity recognition based on BERT and lightweight feature extraction model. Information 13 (11):515. doi:10.3390/info13110515.
  • Zhang, H., C. Huang, M. Li, and B.-L. Lu. 2010. A unified character-based tagging framework for Chinese word segmentation. ACM Transactions on Asian Language Information Processing 9 (2):1–32. doi:10.1145/1781134.1781135.
  • Zhang, Y., and J. Yang. 2018. Chinese NER using lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia 1:1554–64.