Abstract
As a foundation task of natural language processing, text classification is widely used in information retrieval, public opinion analysis, and other related tasks. Facing the problem of sparse features of Chinese short texts, which affects the classification accuracy of Chinese short texts, this paper proposes a Chinese short text classification method based on the Character Frequency Sub-word Enhancement (CFSE), which can effectively improve the classification accuracy of Chinese short texts. First, the initial Chinese-character sequence is mapped to the corresponding Character Frequency Sub-word (CFS) sequence based on the global character1 frequency information. Second, the relationship features among data are extracted based on BiLSTM-Att processing CFS sequence, and the semantic features of the initial Chinese-character sequence are obtained through ERNIE. Finally, these two kinds of features are fused and input into the text classifier to obtain the classification results. Experimental results show that the proposed method can improve the classification accuracy of Chinese short texts.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Character in this paper refers to a single Chinese character.