134
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A deep neural network model for Chinese toponym matching with geographic pre-training model

ORCID Icon, , , , ORCID Icon, ORCID Icon & show all
Article: 2353111 | Received 06 Dec 2023, Accepted 04 May 2024, Published online: 13 May 2024
 

ABSTRACT

Multiple tasks within the field of geographical information retrieval and geographical information sciences necessitate toponym matching, which involves the challenge of aligning toponyms that share a common referent. The multiple string similarity approaches struggle when confronted with the complexities associated with unofficial and/or historical variants of identical toponyms. Also, current state-of-the-art approaches/tools to supervised machine learning rely on labeled samples, and they do not adequately address the intricacies of character replacements either from transliterations or historical shifts in linguistic and cultural norms. To address these issues, this paper proposes a novel matching approach that leverages a deep neural network model empowered by geographic language representation model, known as GeoBERT, which stands for geographic Bidirectional Encoder Representations from Transformers (BERT). This model harnesses the groundbreaking capabilities of the GeoBERT framework by extending a generalized Enhanced Sequential Inference Model architecture and integrating multiple features to enhance the accuracy and robustness of the toponym matching. We present a comprehensive evaluation of the proposed method’s performance using three extensive datasets. The findings clearly illustrate that our approach outperforms the individual similarity metrics used in previous studies.

Acknowledgments

We would like to express our great appreciation to the editors and two anonymous reviewers for constructive comments that helped improve the manuscript.

Disclosure statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability statements

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This study was financially supported by the National Key R&D Program of China (No. 2022YFB3904200, 2022YFF0711601), the Natural Science Foundation of China (No. 42301492), the Open Fund of Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (No. 2022SDSJ04), and the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2022-07-014).