718
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Hierarchical multi-instance multi-label learning for Chinese patent text classification

, , , , , ORCID Icon & show all
Article: 2295818 | Received 09 Apr 2023, Accepted 12 Dec 2023, Published online: 03 Jan 2024
 

Abstract

To further enhance the accuracy of the Chinese patent classification, this paper proposes a model, based on the patent structure and takes the patent claim as subjects, with multi-instance multi-label learning as the main method. Firstly, the patent claims are divided into multiple independent texts using the sequence number as the splitting token. For each patent, multiple claims are regarded as multiple instances, and the corresponding IPCs serve as its multiple labels. Next, the concept of secondary_label is introduced following the composition rules of IPC, and the relationships between instances and multiple secondary_labels are mined through the construction of fully-connected layers. To capture more comprehensive semantic information of instances, BIGRU and self-attention are employed to enhance semantics and reduce information loss during the training process. Finally, the max-pooling operations are utilised to obtain the predicted categories of patents based on capturing the relationships between instances and different hierarchical labels. Experimental results on the '2017 Chinese patent dataset' demonstrate that the multi-instance multi-label approach can effectively mine deeper relationships between patents and labels in classification tasks. As a result, our model significantly improves the accuracy of patent text classification.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Funding

This work was supported by the National Natural Science Foundation of China (Grant number 62076006), the University Synergy Innovation Program of Anhui Province (Grant number GXXT-2021-008), the Opening Foundation of State Key Laboratory of Cognitive Intelligence (Grant number COGOS-2023HE02), and the University Natural Science Research Project of Anhui Province(Grant number 2023AH050846).