134
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A deep neural network model for Chinese toponym matching with geographic pre-training model

ORCID Icon, , , , ORCID Icon, ORCID Icon & show all
Article: 2353111 | Received 06 Dec 2023, Accepted 04 May 2024, Published online: 13 May 2024

References

  • Alsudais, A., W. Alotaibi, and F. Alomary. 2022. “Similarities Between Arabic Dialects: Investigating Geographical Proximity.” Information Processing & Management 59 (1): 102770. https://doi.org/10.1016/j.ipm.2021.102770
  • Amir, A., Y. Aumann, G. Benson, A. Levy, O. Lipsky, E. Porat, S. Skiena, and U. Vishne. 2009. “Pattern Matching with Address Errors: Rearrangement Distances.” Journal of Computer and System Sciences 75 (6): 359–370. https://doi.org/10.1016/j.jcss.2009.03.001
  • Bergstra, J., and Y. Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (2): 281–305.
  • Berkhin, P., M. R. Evans, F. Teodorescu, W. Wu, and D. Yankov. 2015. “A New Approach to Geocoding: BingGC.” In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, edited by Mohamed Ali Yan Huang, 1–10. New York: Association for Computing Machinery.
  • Buckles, B., J. Buckley, and F. E. Petry. 1994. “Architecture of FAME: Fuzzy Address Matching Environment.” In Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference, edited by Nicole McFarlane, 308–312. Orlando, FL, USA: IEEE.
  • Cao, S., W. Lu, J. Zhou, and X. Li. 2018. “cw2vec: Learning Chinese Word Embeddings with Stroke n-Gram Information.” In Proceedings of the AAAI Conference on Artificial Intelligence, edited by Palo Alto, 5053–5061. California: AAAI Press.
  • Chen, J., J. Chen, X. She, J. Mao, and G. Chen. 2021. “Deep Contrast Learning Approach for Address Semantic Matching.” Applied Sciences 11 (16): 7608. https://doi.org/10.3390/app11167608
  • Chen, Q., X. Zhu, Z. Ling, S. Wei, H. Jiang, and D. Inkpen. 2016. “Enhanced LSTM for Natural Language Inference.” arXiv preprint arXiv:1609.06038.
  • Cheng, J., L. Dong, and M. Lapata. 2016. “Long Short-term Memory-networks for Machine Reading.” arXiv preprint arXiv:1601.06733.
  • Cheng, R., J. Liao, and J. Chen. 2022. “Quickly Locating POIs in Large Datasets from Descriptions Based on Improved Address Matching and Compact Qualitative Representations.” Transactions in GIS 26 (1): 129–154. https://doi.org/10.1111/tgis.12838
  • Comber, S., and D. Arribas-Bel. 2019. “Machine Learning Innovations in Address Matching: A Practical Comparison of word2vec and CRFs.” Transactions in GIS 23 (2): 334–348. https://doi.org/10.1111/tgis.12522
  • Devlin, J., M. W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Eidoon, Z., N. Yazdani, and F. Oroumchian. 2008. “Ontology Matching Using Vector Space.” In Advances in Information Retrieval: 30th European Conference on IR Research, edited by Craig Macdonald Iadh Ounis and Vassilis Plachouras Ian Ruthven, 472–481. Berlin Heidelberg: Springer.
  • Fan, Y., L. Pang, J. Hou, J. Guo, Y. Lan, and X. Cheng. 2017. “Matchzoo: A Toolkit for Deep Text Matching.” arXiv preprint arXiv:1707.07270.
  • Hochreiter, S., and J. Schmidhuber. 1997. “Long short-term memory.” Neural computation 9 (8): 1735–1780.
  • Hu, X., H. S. Al-Olimat, J. Kersten, M. Wiegmann, F. Klan, Y. Sun, and H. Fan. 2022b. “GazPNE: Annotation-Free Deep Learning for Place Name Extraction from Microblogs Leveraging Gazetteer and Synthetic Data by Rules.” International Journal of Geographical Information Science 36 (2): 310–337. https://doi.org/10.1080/13658816.2021.1947507
  • Hu, X., Y. Hu, B. Resch, and J. Kersten. 2023a. “Geographic Information Extraction from Texts (GeoExT).” In European Conference on Information Retrieval, edited by Jaap Kamps Lorraine Goeuriot, 398–404. Cham: Springer Nature Switzerland.
  • Hu, X., Y. Sun, J. Kersten, Z. Zhou, F. Klan, and H. Fan. 2023b. “How Can Voting Mechanisms Improve the Robustness and Generalizability of Toponym Disambiguation?” International Journal of Applied Earth Observation and Geoinformation 117: 103191. https://doi.org/10.1016/j.jag.2023.103191
  • Hu, X., Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, and F. Klan. 2022a. “Location Reference Recognition from Texts: A Survey and Comparison.” arXiv preprint arXiv:2207.01683.
  • Hu, X., Z. Zhou, Y. Sun, J. Kersten, F. Klan, H. Fan, and M. Wiegmann. 2022c. “GazPNE2: A General Place Name Extractor for Microblogs Fusing Gazetteers and Pretrained Transformer Models.” IEEE Internet of Things Journal 9 (17): 16259–16271. https://doi.org/10.1109/JIOT.2022.3150967
  • Huang, P. S., X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. “Learning Deep Structured Semantic Models for web Search Using Clickthrough Data.” In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, edited by Qi He Arun Iyengar, 2333–2338. New York: Association for Computing Machinery.
  • Jaccard, P. 1908. “Nouvelles Recherches sur la Distribution Florale.” Bull. Soc. Vaud. Sci. Nat 44: 223–270.
  • Koumarelas, I., A. Kroschk, C. Mosley, and F. Naumann. 2018. “Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection.” Journal of Data and Information Quality (JDIQ) 10 (2): 1–16. https://doi.org/10.1145/3232852
  • Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. “Albert: A Lite Bert for Self-supervised Learning of Language Representations.” arXiv preprint arXiv:1909.11942.
  • Levenshtein, V. I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.” Soviet Physics Doklady 10 (8): 707–710.
  • Li, J., B. Chiu, S. Feng, and H. Wang. 2020a. “Few-shot Named Entity Recognition via Meta-Learning.” IEEE Transactions on Knowledge and Data Engineering 34 (9): 4245–4256. https://doi.org/10.1109/TKDE.2020.3038670
  • Li, J., S. Feng, and B. Chiu. 2023. “Few-shot Relation Extraction with Dual Graph Neural Network Interaction.” IEEE Transactions on Neural Networks and Learning Systems 1–13.
  • Li, J., P. Han, X. Ren, J. Hu, L. Chen, and S. Shang. 2021. “Sequence Labeling with Meta-Learning.” IEEE Transactions on Knowledge and Data Engineering 35 (3): 3072–3086.
  • Li, F., Y. Lu, X. Mao, J. Duan, and X. Liu. 2022. “Multi-task Deep Learning Model Based on Hierarchical Relations of Address Elements for Semantic Address Matching.” Neural Computing and Applications 34 (11): 8919–8931. https://doi.org/10.1007/s00521-022-06914-1
  • Li, J., S. Shang, and L. Chen. 2020b. “Domain Generalization for Named Entity Boundary Detection via Metalearning.” IEEE Transactions on Neural Networks and Learning Systems 32 (9): 3819–3830. https://doi.org/10.1109/TNNLS.2020.3015912
  • Li, L., W. Wang, B. He, and Y. Zhang. 2018. “A Hybrid Method for Chinese Address Segmentation.” International Journal of Geographical Information Science 32 (1): 30–48. https://doi.org/10.1080/13658816.2017.1379084
  • Li, D., S. Wang, and Z. Mei. 2010. “Approximate Address Matching.” In 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, edited by Leopoldo G. Franquelo , 264–269. Fukuoka, Japan: IEEE.
  • Lin, Y., M. Kang, Y. Wu, Q. Du, and T. Liu. 2020. “A deep learning architecture for semantic address matching.” International Journal of Geographical Information Science 34 (3): 559–576.
  • Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. “Roberta: A Robustly Optimized Bert Pretraining Approach.” arXiv preprint arXiv:1907.11692.
  • Ma, X. 2022. “Knowledge Graph Construction and Application in Geosciences: A Review.” Computers & Geosciences 161: 105082. https://doi.org/10.1016/j.cageo.2022.105082
  • Mauro, N., L. Ardissono, and M. Lucenteforte. 2020. “Faceted Search of Heterogeneous Geographic Information for Dynamic map Projection.” Information Processing & Management 57 (4): 102257. https://doi.org/10.1016/j.ipm.2020.102257
  • Moreau, E., F. Yvon, and O. Cappé. 2008. “Robust Similarity Measures for Named Entities Matching.” In Proceedings of the 22nd International Conference on Computational Linguistics, edited by Donia Scott Hans Uszkoreit, 593–600. Manchester, UK: Association for Computational Linguistics.
  • Moura, T. H., C. A. Davis Jr, and F. T. Fonseca. 2017. “Reference Data Enhancement for Geographic Information Retrieval Using Linked Data.” Transactions in GIS 21 (4): 683–700. https://doi.org/10.1111/tgis.12238
  • Parikh, A. P., O. Täckström, D. Das, and J. Uszkoreit. 2016. “A Decomposable Attention Model for Natural Language Inference.” arXiv preprint arXiv:1606.01933.
  • Qin, T., F. Ren, T. Hu, J. Liu, R. Li, and Q. Du. 2016. “Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China.” ISPRS International Journal of Geo-Information 5 (5): 65. https://doi.org/10.3390/ijgi5050065.
  • Qiu, Q., Z. Xie, K. Ma, Z. Chen, and L. Tao. 2022b. “Spatially Oriented Convolutional Neural Network for Spatial Relation Extraction from Natural Language Texts.” Transactions in GIS 26 (2): 839–866. https://doi.org/10.1111/tgis.12887
  • Qiu, Q., Z. Xie, K. Ma, L. Tao, and S. Zheng. 2023. “NeuroSPE: A Neuro-net Spatial Relation Extractor for Natural Language Text Fusing Gazetteers and Pretrained Models.” Transactions in GIS 27 (5): 1526–1549. https://doi.org/10.1111/tgis.13086
  • Qiu, Q., Z. Xie, S. Wang, Y. Zhu, H. Lv, and K. Sun. 2022a. “ChineseTR: A Weakly Supervised Toponym Recognition Architecture Based on Automatic Training Data Generator and Deep Neural Network.” Transactions in GIS 26 (3): 1256–1279. https://doi.org/10.1111/tgis.12902
  • Recchia, G., and M. Louwerse. 2013. “A Comparison of String Similarity Measures for Toponym Matching”.
  • Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.” arXiv preprint arXiv:1910.01108.
  • Santos, J., I. Anastácio, and B. Martins. 2015. “Using Machine Learning Methods for Disambiguating Place References in Textual Documents.” GeoJournal 80 (3): 375–392. https://doi.org/10.1007/s10708-014-9553-y
  • Santos, R., P. Murrieta-Flores, P. Calado, and B. Martins. 2018b. “Toponym Matching Through Deep Neural Networks.” International Journal of Geographical Information Science 32 (2): 324–348. https://doi.org/10.1080/13658816.2017.1390119
  • Santos, R., P. Murrieta-Flores, and B. Martins. 2018a. “Learning to Combine Multiple String Similarity Metrics for Effective Toponym Matching.” International Journal of Digital Earth 11 (9): 913–938. https://doi.org/10.1080/17538947.2017.1371253
  • Shan, S., Z. Li, Q. Yang, A. Liu, L. Zhao, G. Liu, and Z. Chen. 2020. “Geographical Address Representation Learning for Address Matching.” World Wide Web 23 (3): 2005–2022. https://doi.org/10.1007/s11280-020-00782-2
  • Su, T. R., and H. Y. Lee. 2017. “Learning Chinese Word Representations from Glyphs of Characters.” arXiv preprint arXiv:1708.04755.
  • Sun, Z., A. G. Qiu, J. Zhao, F. Zhang, Y. Zhao, and L. Wang. 2013. “Technology of Fuzzy Chinese-Geocoding Method.” In 2013 International Conference on Information Science and Cloud Computing, edited by W. Dale Blair, 7–12. Guangzhou, China: IEEE.
  • Varol, C., and C. Bayrak. 2012. “Hybrid Matching Algorithm for Personal Names.” Journal of Data and Information Quality (JDIQ 3 (4): 1–18. https://doi.org/10.1145/2348828.2348830
  • Wang, Z., W. Hamza, and R. Florian. 2017. “Bilateral Multi-perspective Matching for Natural Language Sentences.” arXiv preprint arXiv:1702.03814.
  • Winkler, W. E. 1990. “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”.
  • Xu, L., R. Mao, C. Zhang, Y. Wang, X. Zheng, X. Xue, and F. Xia. 2022. “Deep Transfer Learning Model for Semantic Address Matching.” Applied Sciences 12 (19): 10110. https://doi.org/10.3390/app121910110
  • Yang, Z., Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. 2019. “Xlnet: Generalized Autoregressive Pretraining for Language Understanding.” Advances in Neural Information Processing Systems 32.
  • Yin, W., H. Schütze, B. Xiang, and B. Zhou. 2016a. “Abcnn: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs.” Transactions of the Association for Computational Linguistics 4: 259–272. https://doi.org/10.1162/tacl_a_00097
  • Yin, R., Q. Wang, P. Li, R. Li, and B. Wang. 2016b. “Multi-granularity Chinese Word Embedding.” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, edited by Jian Su Kevin Duh, 981–986. Austin, Texas: Association for Computational Linguistics.
  • Yu, J., X. Jian, H. Xin, and Y. Song. 2017. “Joint Embeddings of Chinese Words, Characters, and Fine-Grained Subcharacter Components.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, edited by Martha Palmer Rebecca Hwa and Sebastian Riedel, 286–291. Copenhagen, Denmark: Association for Computational Linguistics.
  • Zhang, X., Y. Huang, C. Zhang, and P. Ye. 2022. “Geoscience Knowledge Graph (GeoKG): Development, Construction and Challenges.” Transactions in GIS 26 (6): 2480–2494. https://doi.org/10.1111/tgis.12985
  • Zhang, H., F. Ren, H. Li, R. Yang, S. Zhang, and Q. Du. 2020. “Recognition Method of new Address Elements in Chinese Address Matching Based on Deep Learning.” ISPRS International Journal of Geo-Information 9 (12): 745. https://doi.org/10.3390/ijgi9120745