729
Views
0
CrossRef citations to date
0
Altmetric
Research Article

AugGKG: a grid-augmented geographic knowledge graph representation and spatio-temporal query model

, , , , , & show all
Pages 4934-4957 | Received 16 Aug 2023, Accepted 28 Nov 2023, Published online: 11 Dec 2023

ABSTRACT

As an emerging knowledge representation model in the domain of knowledge graphs, geographic knowledge graph can take full advantage of semantic, spatial and temporal information to facilitate answering spatio-temporal questions and completing relations. However, the representation of geographic knowledge graphs still has issues such as the difficulty of unified heterogeneous spatio-temporal data modelling, weak ability to answer spatio-temporal queries for dynamic multiobjective problems, and low efficiency of graph querying. This paper presents a grid-augmented geographic knowledge graph (AugGKG) based on the GeoSOT global subdivision grid model and time slice subgraph architecture. AugGKG discretely normalizes the spatio-temporal data of the graph, which involves five types of nodes and two types of relations. By using the geo-hidden layer of the graph and geocoding algebraic operations, the AugGKG can quickly answer complex multiobjective spatio-temporal queries and complete implicit spatio-temporal relations. Compared with existing geographic knowledge graphs (YAGO, GeoKG and GEKG), the comparative experiments verified the obvious advantages of AugGKG in terms of uniformity of accuracy, completeness, and efficiency. Hence, AugGKG is expected to be regarded as an innovative and robust geographic knowledge graph that can perform fast computation and relation completion for complex spatio-temporal queries in future geospatial question answering applications.

1. Introduction

The knowledge graph (KG) is a new paradigm for the representation, query, and fusion of highly heterogeneous data (Janowicz et al. Citation2022; Li et al. Citation2023). A knowledge graph is a multi-relational network that represents entities as nodes and semantic relations as various types of edges (Shen, Zhang, and Cheng Citation2022; Zhang et al. Citation2020). Each graph unit is in the form of a triplet (Subject, Predict, Object) or (SPO) for short, where S and O are two entities, P is the relation the fact describes, and knowledge completion is to calculate or predict the missing S, P or O of the triplet (Wang et al. Citation2018). As an emerging domain knowledge representation model for knowledge graphs, geographic knowledge graph (GKG) acts as a bridge between artificial intelligence and geographic knowledge (Jiang et al. Citation2018). An interlinking of geographic objects within knowledge graphs could enable semantic, temporal, and spatial information to be integrated to facilitate and enhance spatio-temporal question answering and knowledge completion (Tempelmeier and Demidova Citation2021).

Currently, GKG architectures generally adopt the top-down approach (Yin et al. Citation2022). Using geographic ontologies as a framework, GKG is a more complex representation of geographic knowledge obtained by associating various nodes through semantic and spatio-temporal information (Ma Citation2022). The existing GKG often extends the traditional knowledge graph by adding triplets with spatio-temporal entities or relations to query and represent spatio-temporal knowledge (Huang et al. Citation2019). Similar to databases, GKG can answer basis spatio-temporal query questions and output results through query statements used for retrieval in graph databases (Ding et al. Citation2022; Sun and Sarwat Citation2018). Hoffart proposed Yet Another Great Ontology 2 (YAGO2), where each entity can add its time and space information (Hoffart et al. Citation2013). YAGO2 is an enhancement of YAGO, which is a preliminary study of adding spatio-temporal information to knowledge graphs. A Geographic-Formalized Knowledge Graph (GeoKG) focuses on knowledge of time and space and expresses the changes in each geo-object (Wang et al. Citation2019). Recently, Zheng presented a geographic evolutionary knowledge graph (GEKG), which establishes a multi-layered knowledge structure (Zheng et al. Citation2021). However, in complex spatiotemporal scenarios, existing geographic knowledge graphs still perform poorly (Yan Citation2019), which is mainly due to the following difficulties:

The first challenge for geographic knowledge graphs is the unified modelling of heterogeneous spatio-temporal data. In geographic knowledge graphs, determining how to represent spatio-temporal data is a key issue (Du et al. Citation2021). The representations of geographic entities, such as vehicles, storms, and mountains, are highly inconsistent and diverse, and they are difficult to summarize in real-world applications (Zuheros et al. Citation2019). Ordinary graphs use semantic relations to define spatio-temporal information and ontologies, and the relations are often nonunique (Kokla and Guilbert Citation2020). The accuracy of different references and the consistency of the units will also seriously affect the querying of spatio-temporal data under the knowledge graph (Janowicz et al. Citation2020; Wiemann and Bernard Citation2015). In short, due to the heterogeneity and ambiguity of spatio-temporal data, knowledge disambiguation has certain problems (Hu Citation2018).

On the other hand, the method of answering spatio-temporal questions for dynamic multiobjectives, which refers to the query problem for a large number of entities in a time-varying environment, is still lacking (Chen et al. Citation2022). Although there has been certain progress in general semantic question answering, existing knowledge graphs still have difficulty querying triplet sentences that contain geographic objects of requiring complex spatial and temporal operations (Mai et al. Citation2021). Most knowledge graph models that take space into consideration rely primarily on geographic location tags, but these models inevitably lose complex geographic information such as relative distance, direction, and topology relations (Hamzei, Winter, and Tomko Citation2021). Thus, the current knowledge graph only query and obtain the simple spatio-temporal and semantic relations stored in the graph (Bai et al. Citation2023; Ge et al. Citation2022). Multiobjective and multiconditional spatio-temporal questions, such as ‘Which entity southeast of ship A is closest to it at 1:00-3:00 pm?’, are difficult to answer without completing corresponding relations (Hamzei, Winter, and Tomko Citation2021).

Another inherent difficulty is improving the graph query efficiency (Alam, Torgo, and Bifet Citation2022). The speed of spatial and temporal retrieval is important when querying large amounts of geographic objects (Bao et al. Citation2023). Existing geographic knowledge graphs use latitude and longitude methods for querying (Zhou et al. Citation2021). However, as the size of the spatio-temporal data increases, the number of relations expands exponentially. Hence, the efficiency of spatial queries using the ordinary approach is greatly reduced (He, Chu, and Li Citation2017). Recently, some discrete global grid system (DGGS) models have been developed, which have demonstrated significant advantages in the spatio-temporal query indexing of massive geographic data (Li et al. Citation2019; Qu et al. Citation2020). We expect a combination of knowledge graphs and DGGSs to provide an effective solution for geospatial environment modelling and answer querying.

Therefore, to solve the above three issues, based on the Geographical coordinate Subdivision grid with One-dimensional integral coding on 2n-Tree (GeoSOT) model (Han et al. Citation2022) and time slice subgraph architecture, this paper presents a grid-augmented geographic knowledge graph (AugGKG) to achieve the standardization of spatio-temporal graph data in complex geographical scenarios, which involves five types of nodes and two types of relations. By the geo-hidden layer, the AugGKG can quickly answer the computational reasoning problem of multi-objective spatio-temporal query. Compared with other geographical knowledge graphs in experiment, we will demonstrate the advantages of AugGKG in completeness, efficiency, and uniformity of accuracy.

2. Grid-augmented geographic knowledge graph representation

The structure of AugGKG is shown in . Based on the top-down approach design, the AugGKG is composed of 5 node elements (Locgrid, Entity, Heading, Event, Attribute) and 2 relation types (ExplicitRelation and ImplicitRelation), which can provide a unified representation of geographic knowledge. AugGKG uses a global subdivision grid model called GeoSOT as a foundation for the modelling of spatio-temporal data. According to the design in , through the GeoSOT spatial subdivision and time slice design, AugGKG can assign spatio-temporal knowledge to each geo-object in the graph without occupying the mass storage of graph relations. AugGKG also proposes the geo-hidden layer of the graph, which is a H32×m×n tensor with spatio-temporal computing capabilities arranged in GeoSOT code order; thus, AugGKG can quickly compute and query complex multiobjective spatio-temporal questions and complete implicit spatio-temporal relations.

Figure 1. Research components of AugGKG.

Figure 1. Research components of AugGKG.

2.1. Spatio-temporal grid data modelling based on GeoSOT

This paper uses GeoSOT, a DGGS architecture, for accurate and standardized coding of spatio-temporal data (Li, McGrath, and Stefanakis Citation2021). The Geographical coordinate Subdivision grid with One-dimensional integral coding on 2n-Tree (GeoSOT) is a global 2D subdivision grid with 32 levels, constituting a quadtree multi-level dissection structure from the earth plane (level 0) to the 1.5 cm scale (level 32) (Han et al. Citation2021). GeoSOT has the advantages of multilevel and regional uniqueness; thus, it can accurately and finely define spatio-temporal data in geographic knowledge graphs. Notably, the basic components of DGGS are cells or zones, and those of GeoSOT are grids; their kernels are basically consistent (Li et al. Citation2019).

GeoSOT adopts a binary 1D code, which can efficiently perform neighbourhood location, orientation and distance calculations at multiple levels (Cheng et al. Citation2016; Hou et al. Citation2021). Hence, spatial algebraic computations between GeoSOT grids can replace complex logical reasoning in knowledge graphs that rely on logical edges, and the result of grid calculation reasoning is naturally spatio-temporal knowledge. GeoSOT grid codes have the advantage of binary operations and support parallelized processing, which can improve the efficiency of knowledge graph retrieval and inference in large-scale data.

In this paper, spatio-temporal data are mainly used to represent entities and events in the AugGKG. Spatio-temporal data are encoded with GeoSOT grids, which form locgrids that can represent the spatial information of the dataset. Time information can be uniquely and discretely represented by time slices based on the GeoSOT time subdivision model, which will be introduced in Section 2.2.1. As the nodes of the AugGKG, the locgrid is connected with the entities and events and is associated with the environmental information within the grid space.

The spatio-temporal grid data modelling process is shown in Algorithm 1. After performing AugGKG data grid collection GL with 32 levels, selecting the appropriate GeoSOT L level for knowledge representation is the final key. In the AugGKG, we usually select the GeoSOT level that enables every spatio-temporal data value to be connected with a unique locgrid, thus avoiding redundancy for spatio-temporal knowledge queries.

2.2. Grid-augmented knowledge representation

In this section, the grid-augmented knowledge representation is defined. After spatio-temporal grid data modelling based on GeoSOT, the AugGKG can be defined by creating subknowledge graphs on each time slice. Then, the node and edge elements of the AugGKG can be disassembled.

2.2.1. Time slices

For continuous temporal representation, time is usually expressed as a real-number point on the time axis, which expresses the instantaneous time. A time slice can be regarded as a discrete time representation model, which corresponds to a segment on the time axis. Any time occurring in the time slice tk is regarded as simultaneous in the model. Thus, we can construct a knowledge graph on each time slice to perform the entity relation analysis on time series. Time slices have diversified coding methods. For convenience in using time slices, this paper uses GeoSOT-T, a multiscale and unified discrete-time coding system, to encode time slices (Qian et al. Citation2019). A time slice is represented in .

Let the time slice collection T be represented by Eq. 1: (1) T={t1,t2,,tk,,tn},tktk+1=(1) where tk is the unit time slice. The interval t=tk+1tk of the time slice is determined by the actual scene associated within the graph. When the time series of data in the scenario are closely arranged, the time interval t needs to be smaller. However, we also need to consider the storage and calculation cost of building a knowledge graph on a time slice. Each node in the AugGKG is connected with a time slice as a special kind of node. The AugGKG can be defined by Eq. 2: (2) {AugGKG={KG1,KG2,,KGk,,KGn}KGk={Nodes,Edges|t=tk}(2) Here, KGk is the subknowledge graph under time slice tk, and every KGk is composed of nodes and edges. In this way, by arranging a knowledge graph KGk on each time slice in chronological order, the basic framework of the AugGKG is constructed. In Sections 2.2.2 and 2.2.3, we introduce the composition of the nodes and edges of the AugGKG.

2.2.2. Nodes

The AugGKG node contains 5 different elements: Locgrid, Entity, Heading, Event, and Attribute. Their distributions on each time slice are shown in .

Figure 2. Element distribution of an AugGKG at a single time slice.

Figure 2. Element distribution of an AugGKG at a single time slice.

(1) Locgrid

Locgrid illustrates the code of the region range, and the structure of Locgrid can be given by Eq. 3: (3) Locgrid={Locgridcode|LocgridlevelKGk,level{1,2,,32}}(3) In the AugGKG, Locgrid uses the GeoSOT encoding scheme from Section 2.1. Locgridcode represents the code name of this spatial grid, and the smaller level is, the larger the spatial extent of the Locgirdcode maps. Locgrid uniquely identifies the spatial extent and connects with the entities, events and environmental attributes within the grid where it is located, avoiding duplicate descriptions of the same basic location types and reference locations. At the same time, Locgrid participates in the spatio-temporal query and computation between nodes as the hidden-layer parameter of the spatio-temporal relation, which will be illustrated in Section 3.3.1.

(2) Entity

Entity describes an independent geographic object that can be clearly separated from other entities, e.g. ships, buildings, and mountains. Entity is the core node of the AugGKG, and most of the spatio-temporal relations are queried and calculated from Entity. Entity can be given by Eq. 4: (4) Entity={Entityname,Entityclass,Entitystate|EntityKGk}(4) where Entityname indicates the unique identifier of the entity, Entityclass shows the category of the entity, and Entitystate is divided into static and dynamic, which represents whether Entity can be moved. A dynamic Entity will be associated with the heading. In contrast to the entities of a traditional knowledge graph, the AugGKG has the advantage of being able to analyse the spatial relations of dynamic entities such as aircraft or ships.

(3) Heading

Heading represents the forward direction of the dynamic Event or Entity, such as ships, cars, and planes, at the current time slice, and it is given by Eq. 5: (5) Heading={Headingdirection|HeadingKGk}(5) where Headingdirection is the true heading of the dynamic object, which refers to the included angle between the true north line and the heading line. The true heading is based on the true north line (0°) and is measured clockwise to the heading line, and its range is 0°−360°. Heading can be used to analyse the destination or aim of the associated object, where static Entity has no Heading connected to it.

(4) Event

Event represents a natural or artificial phenomenon that greatly affects the scenario. Event can be expressed by Eq. 6: (6) Event={Eventname,Eventthreat,Eventthreshold|EventKGk}(6) where Eventname is the unique identifier of this Event. Eventthreat indicates the event type based on the threat level. Generally, there are three types of events: (1) negative events, such as volcanic eruptions and tornadoes; (2) emergency events, e.g. earthquakes and local wars; and (3) positive events, such as contests and prizes. Eventthreshold represents the farthest distance that the effect of the event will reach.

(5) Attribute

Attribute illustrates the feature information of the node, which includes environmental and semantic information. Each Locgrid, Entity, and Event may have attributes associated with it. Attribute can be described by Eq. 7: (7) Attribute={Attributevalue,Attributeclass|AttributeKGk}(7) All the feature descriptions belong to attribute nodes, such as size, level, and pressure. Attribute represents an inherent or variable property of a node. For example, when it is associated with an entity, Attributeclass could be Birthplace with an inherent value, and Attributevalue could be Raffles. On the other hand, the Attribute of Locgrid is often variable, such as seawater temperature as Attributeclass and 25°C as Attributevalue.

2.2.3. Edges

In the AugGKG, nodes are connected by edges to represent their relations. The essence of a relation is a mapping from one domain to another. A triplet of nodes and edges forms the fundamental element of the knowledge graph, which can be shown as Eq. 8: (8) Subject,Predicate,Object(8) Similar to a sentence, Subject and Object are nodes of the knowledge graph, and Predicate identifies the edge representing the semantic or spatio-temporal relation between the two nodes. The edge relations of the AugGKG can be divided into ExplicitRelations and ImplicitRelations, as shown in .

Figure 3. Explicit relation and implicit relation.

Figure 3. Explicit relation and implicit relation.

(1) ExplicitRelation

ExplicitRelations are triplets that are stored in a graph database without the need for reasoning. ExplicitRelations can directly represent simple spatio-temporal or semantic relations between nodes.

(a) Temporal relations between nodes

A temporal relation reflects changes in Entity, Event, and Heading in adjacent time slices, which can be shown in Eq. 9: (9) {Entityrelationtemporal=EntityiKGk1,timechange,EntityiKGkEventrelationtemporal=EventiKGk1,timechange,EventiKGkHeadingrelationtemporal=HeadingiKGk1,timechange,HeadingiKGk(9) (b) Spatial location relations

In the AugGKG, geographic elements are within the spatial scope of the GeoSOT code. A geographic element is described by associating the spatial code Locgrid. The different types of relations are identified in Eq. 10: (10) {EntityLocgridrelation=EntityiKGk,Locatedin,LocgridjKGkEventLocgridrelation=EventiKGk,Locatedin,LocgridjKGk(10) (c) Semantic relations

There are various semantic relations between Entities, which could be used for logical reasoning or conventional knowledge query. In the AugGKG, the semantic relations between Entities can also be connected through time slices, as shown in Eq. 11: (11) Entityrelationsemantic=EntityiKGk,semanticrelation,EntityjKGkorKGq(11) On the other hand, Entity, Event, and Locgrid connect to Attribute, and the triplet of their semantic relations is represented as Eq. S1.

(2) ImplicitRelation

Taking a knowledge graph of 1000 entities with 8 time slices as an example, if we want to query the distance of an entity with respect to other entities at all time slices in the traditional way, the number of triplets to be queried could amount to nearly 4 million. As the numbers of entities and events increase, the number of queries expands exponentially. The main purpose of setting up ImplicitRelations is to reduce the overhead of relational storage, to increase the breadth of knowledge queries and to manage a geographic knowledge graph with a larger amount of geospatial data.

Therefore, ImplicitRelations are initially hidden in the geo-hidden layer of Locgrid. When the corresponding calculation is inferred from the knowledge query, the associated Locgrid parameter is activated to achieve real-time completion of ImplicitRelations. The geo-hidden layer and how to complete ImplicitRelations within it will be introduced in Section 2.3.

(a) Relation between an Entity and its Heading

The relation between a dynamic Entity and Heading in the current time slice can help users predict the future Entity intent, and the relation can also be used as an auxiliary condition to check the entity's trajectory. The relation can be expressed as Eq. 12: (12) EntityHeadingrelation=(Entityname(i)|EntityiKGk),isLedby,(Headingdirection(i)|HeadingiKGk)(12) If the dynamic Entity in KGk has an associated Heading, it will be connected directly. However, if the Heading of the Entity does not exist, the Heading should be calculated according to Eq. S2:

(b) Spatialrelation between an Entity and Entity

The current knowledge query methods for graphs can only obtain the spatial relations stored in the graph database, for example, ‘Where is the location of Peking University?’ and ‘Where is ship A sailing at 7:23?’ However, more complicated questions, such as ‘What are the distance and heading of ship A relative to ship B, and are they adjacent?’ are not possible to answer, as a knowledge graph cannot store all the spatial relations between Entities. The AugGKG can represent all kinds of spatial relations through the Locgrid associated with the Entities. As intermediate nodes for entity spatial relation queries, a Locgrid can be directly connected with other Locgrid nodes by inference, and spatial relation completion is performed in real time with grid algebraic computations. In this paper, the Spatialrelation between an Entity and Entity is expressed as Eq. 13: (13) EntityEntityrelation=EntityiKGk,Spatialrelation,EntityjKGk(13) The Spatialrelations for which queries can be computed are shown in below.

Table 1. Representation of spatial implicit relations between entities.

(c) Influence between an Event andEntity

Similar to the relations between Entities, the relation between Event and Entity in the AugGKG is shown by Eq. 14: (14) EventEntityrelation=EventiKGk,Influence,EntityjKGk(14) where Influence is determined by the types of Eventthreat and Entity, as shown in .

Table 2. Representation of various types of implicit relations between events and entities.

2.3. Geo-hidden layer of a graph

This paper proposes the geo-hidden layer of a graph, which is the Locgrid tensor H32×m×n with spatio-temporal computing capabilities arranged in GeoSOT coding order, as shown in . In the geo-hidden layer, H32×m×n contains spatial information, which is considered a pipeline for spatio-temporal transformation. By inputting the Locgrid information associated with the nodes into H32×m×n, the spatio-temporal ImplicitRelation result can be output and completed.

Figure 4. Geo-hidden layer of a graph.

Figure 4. Geo-hidden layer of a graph.

In the geo-hidden layer, the implicit spatio-temporal calculation mainly includes a distance calculation, topological relation calculation, and direction calculation. Given two locgrids named LocgridA and LocgridB, their spatio-temporal calculations in the geo-hidden layer are expressed with coding algebraic operations as follows:

2.3.1. Distance calculation

DistanceAB in H32×m×n is calculated by Eq. 15, and fspan is represented as Eq. 16: (15) {dgrid(LocgridA,LocgridB)=fspanlon2+fspanlat2DistanceAB=dTrue(LocgridA,LocgridB)=dgrid×levelscale(15) (16) {fspanlon={(LocgridlonALocgridlonB)(32level),g1=g2(LocgridlonA+LocgridlonB)(32level),g1g2fspanlat={(LocgridlatALocgridlatB)(32level),g1=g2(LocgridlatA+LocgridlatB)(32level),g1g2(16) where levelscale is the grid size for this level. g1 and g2 are the first codes of the corresponding LocgridA and LocgridB dimensions, respectively; g1=g2 represents nodes located in the same hemisphere, and g1g2 indicates different locations.

2.3.2. Topological relation calculation

Topological relations include overlapped, adjacent, and nonconsecutive relations, which are also determined by the hidden layer distance dgrid. Thus, the topological relation TopologyAB can be described by Eq. 17: (17) TopologyAB={overlapped,dgrid=0adjacent,dgrid=1or2nonconsecutive,dgrid>2(17) where when dgrid=1, the nodes are edge adjacent, and when dgrid=2, the nodes are corner adjacent.

2.3.3. Direction calculation

Eq. 18 is the direction calculation in the geo-hidden layer H32×m×n, and Eq. 16 shows the representation of fspan. (18) DirectionABKGk={arccos|fspan(LocgridlonA,LocgridlonB)|fspan(LocgridlonA,LocgridlonB)2+fspan(LocgridlatA,LocgridlatB)22iffspanlon>0,fspanlat>090+arcsin|fspan(LocgridlonA,LocgridlonB)|fspan(LocgridlonA,LocgridlonB)2+fspan(LocgridlatA,LocgridlatB)22iffspanlon>0,fspanlat<0180+arccos|fspan(LocgridlonA,LocgridlonB)|fspan(LocgridlonA,LocgridlonB)2+fspan(LocgridlatA,LocgridlatB)22iffspanlon<0,fspanlat<0270+arcsin|fspan(LocgridlonA,LocgridlonB)|fspan(LocgridlonA,LocgridlonB)2+fspan(LocgridlatA,LocgridlatB)22iffspanlon<0,fspanlat>0(18) In this way, through the grid coding of spatio-temporal data based on GeoSOT, the AugGKG establishes subgraphs with time slices as units, achieves the geographic knowledge representation of nodes and edges, and completes complex spatio-temporal relation with geo-hidden layer. Next, we introduce how to query and calculate various types of spatio-temporal knowledge in the AugGKG.

3. Spatio-temporal knowledge query and calculation

In this section, the knowledge query and calculation method of the AugGKG is described. Knowledge queries are classified into three types: simple queries, regular spatio-temporal queries, and complex spatio-temporal deductional queries.

3.1. Simple query

Simple queries mainly reason about existing semantic relations in the knowledge graph, such as entity attributes and affiliations. A simple query is a basic knowledge question and answer; hence, AugGKG's simple query method is similar to those of other knowledge graphs. Simple queries include single-hop queries and multihop queries, which can be expressed as shown in .

Table 3. Simple single-hop and multihop queries.

3.2. Regular spatio-temporal query

A regular spatio-temporal query is mainly able to determine the fundamental spatio-temporal relation of each node in the scene. At present, some geographic knowledge graphs, such as GeoKG and GEKG, can perform queries on some existing spatio-temporal relations in graph databases. However, AugGKG can perform regular spatio-temporal queries with higher efficiency based on the global subdivision grid. An efficiency comparison of a regular spatio-temporal query in different geographic knowledge graphs will be performed in Section 4.3.2.

Regular spatio-temporal queries include spatio-temporal index queries and spatio-temporal range queries. In a spatio-temporal index query, the query triplet contains spatio-temporal elements (e.g. locgrid, time slice, time change). Spatio-temporal range queries mainly output all nodes in a certain spatio-temporal range. Due to the binary encoding advantage of the GeoSOT grid, the spatio-temporal range query in the AugGKG first determines the time slices to be queried in the knowledge graph, calculates all the Locgrids in the spatial range, and then queries all the nodes associated with the Locgrids.

Examples of regular spatio-temporal queries are shown in . If we want to output a geographic location, Locgrid can be decoded to a readable latitude and longitude string with the final answer. However, a regular spatio-temporal query is unable to reason about ImplicitRelations, which is addressed by the complex spatio-temporal deductional query of the AugGKG as described below.

Table 4. Spatio-temporal index query and spatio-temporal range query, which are regular spatio-temporal queries.

3.3. Complex spatio-temporal deductional query

In the AugGKG, a complex spatio-temporal deductional query can answer dynamic multiobjective spatio-temporal questions and complete various ImplicitRelations. This paper performs a sentence-by-sentence query, and the specific complex spatio-temporal deductional query method is shown in . We use a simple query or regular spatio-temporal query to search explicit relations, while for implicit relations that are not stored, the spatio-temporal calculation and completion are processed in the geo-hidden layer H32×m×n, which is introduced in Section 2.3.

Figure 5. The complex spatio-temporal deductional query method.

Figure 5. The complex spatio-temporal deductional query method.

Through the process in and the ImplicitRelation calculation representation, a complex spatio-temporal deductional query has been completed. Next, we take a practical query as an example to show the application of the geo-hidden layer.

For the question ‘Between 0:00 and 2:00 on June 4, which was the nearest southeastern ship to ship A?’, shows how to handle this complex deductive question in the AugGKG. Based on the coding algebraic operations of the geo-hidden layer, the spatio-temporal ImplicitRelations in the query language can be computed and completed.

Figure 6. Example flow of a complex spatio-temporal deductional query in AugGKG.

Figure 6. Example flow of a complex spatio-temporal deductional query in AugGKG.

Meanwhile, in the complex spatio-temporal deductional query, the computed ImplicitRelations will be stored in the AugGKG after completion. It is obvious that these stored spatio-temporal relations have a higher probability of being queried again in the scenario. To improve the efficiency of the query, these computed locgrid-pairs are identified as active parameters in the geo-hidden layer H32×m×n, which pops up in the hidden layer. The popping up of the geo-hidden layer parameter is shown in .

Figure 7. Popping up of the geo-hidden layer parameter in the AugGKG.

Figure 7. Popping up of the geo-hidden layer parameter in the AugGKG.

If a spatio-temporal ImplicitRelation is queried that is associated with the active locgrid-pair but for different geo-nodes, we simply need to recall the results that have already been stored in the geo-hidden layer H32×m×n and output them without further calculations.

4. Experiment

In this section, based on a specific spatio-temporal scenario and relevant datasets, this paper will construct multiple geographic knowledge graphs, including an AugGKG and other general models (YAGO, GeoKG and GEKG). Through the comparisons and evaluations of three knowledge queries, AugGKG shows great superiority in knowledge representation and complex spatio-temporal deduction queries.

4.1. Description of the data and scenes

This study involved spatio-temporal data of an experimental scene from 0:00 to 24:00 on 4 June 2019, which included 3577 geo-entity data values, such as entity name, class, attributes, time, latitude, longitude, and heading (if applicable). describes the structure of the experimental dataset. Various environmental elements trigger geographic events, such as tsunamis, earthquakes, and bonus missions, which could affect the entities in the scene.

Table 5. Structure of the experimental datasets.

4.2. Knowledge graph constructions

4.2.1. Existing models (YAGO, GeoKG, GEKG)

Based on the above scene data, existing geographic knowledge graph models such as YAGO, GeoKG and GEKG were constructed. As shown in (a), YAGO uses three types of nodes: entity, attribute, and time, connected by relations. In YAGO, both entity and attribute connections need to be additionally associated with time, and the time needs to follow the updated attribute. In (b), the GeoKG defines five types of nodes, that is, location, time, attribute, state, and object nodes, and three kinds of edges, relationships, changes, and element links, where the key is to connect the states of each geo-object. As shown in (c), the GEKG proposes four types of nodes and two types of edges, and the evolutionary information of the elements is presented through a hierarchical cubical graph structure. At the same time, the computing architecture of the YAGO, GeoKG and GEKG is basically the same as that of ordinary graph databases; they all query the other stored in the graph based on two of the subject, predicate, or object in the triplet.

Figure 8. Existing geographic knowledge representation framework with experimental data. (a) YAGO. (b) GeoKG. (c) GEKG

Figure 8. Existing geographic knowledge representation framework with experimental data. (a) YAGO. (b) GeoKG. (c) GEKG

4.2.2. Auggkg

Based on the GeoSOT grid model, the AugGKG discretized the space and time domains, as shown in . Given a time slice interval of 2 h, twelve subgraphs were created at the 16th GeoSOT level, representing a grid size of approximately 1 km. The AugGKG built the graph database, which contained 10,764 nodes and 39,390 explicit relations, while the implicit spatio-temporal relations were embedded in the geo-hidden layer H32×m×n.

Figure 9. Geographic knowledge representation framework in the AugGKG with experimental data.

Figure 9. Geographic knowledge representation framework in the AugGKG with experimental data.

4.3. Comparisons of three knowledge queries

Based on AugGKG and other existing graph models, this paper compares the results of 3 types of knowledge queries using the Neo4j platform, which is currently a relatively mature graph data management platform. The types of questions in the experiment and the answers to the questions in each graph model are shown in .

Table 6. Answers to each question in YAGO, GeoKG, GEKG, and AugGKG.

This paper also analyses the limitations of existing geographic knowledge graphs from the three perspectives of completeness, efficiency and uniformity of accuracy, corresponding to the three challenges described in Section 1, and the advantages of the AugGKG are demonstrated accordingly.

4.3.1. Completeness

The completeness corresponds to the quality of ‘answering the graph question’, which mainly refers to the degree of correctness and completeness in the knowledge graph answer. The completeness of the four graph models is shown in . AugGKG has better answer results for all three types of queries, and only AugGKG is able to perfectly query the spatio-temporal range. In particular, for complex spatio-temporal deductional queries, AugGKG is able to complete spatio-temporal computation and spatio-temporal ImpilictRelations with detailed query results, which cannot be achieved by existing knowledge graph models.

Table 7. Completeness of geographic knowledge graphs in answering three types of query questions.

4.3.2. Efficiency

The efficiency corresponds to the ‘graph spatio-temporal query efficiency’. AugGKG uses the GeoSOT grid for spatio-temporal queries, while existing knowledge graphs use traditional latitude and longitude structures. By only changing the environment modelling methods, we carried out a spatio-temporal data query and a spatial K-nearest neighbours (KNN) query between the AugGKG and other traditional geographic knowledge graphs. The query time comparisons can be found in . Due to the efficient GeoSOT grid coding algebra, especially in the case of a large data volume amounting to 1.0×108 data points, the spatio-temporal query retrieval speed of AugGKG could be improved by more than 15 times, while the KNN query speed of AugGKG was also increased by 3 times.

Figure 10. Comparisons of the spatio-temporal query efficiency using AugGKG and other knowledge graphs. (a) Query of spatio-temporal data in a defined polygon area; (b) query of spatial K-nearest neighbours (KNN) with a data volume of 1.0×108.

Figure 10. Comparisons of the spatio-temporal query efficiency using AugGKG and other knowledge graphs. (a) Query of spatio-temporal data in a defined polygon area; (b) query of spatial K-nearest neighbours (KNN) with a data volume of 1.0×108.

4.3.3. Uniformity of accuracy

The uniformity of accuracy in the graph represents the degree of ‘unified modelling of heterogeneous spatio-temporal data’. The accuracy of geographic knowledge graphs mainly considers the spatio-temporal resolution of the query results. The accuracy of existing geographic knowledge graphs depends on the resolution of the spatio-temporal data in the graph database, which will lead to resolution inconsistency if heterogeneous datasets are heavily involved.

While the accuracy of the AugGKG is determined by the grid subdivision level, all spatio-temporal data have been standardly geocoded, which greatly reduces the difficulty of cleaning multivariate heterogeneous data. In practical applications, the AugGKG generally selects a larger grid level that can guarantee basic accuracy and recognizability, thus reducing the storage size of the graph and improving query efficiency. If we want to improve the accuracy of the query results, we could simply increase the grid level and time slice density.

5. Discussions

Through the comparative experiments in Section 4, the AugGKG and other existing geographic knowledge graphs separately queried different questions, and the advantages and disadvantages of these graphs were evaluated from three perspectives, that is, completeness, efficiency, and uniformity of accuracy, as shown in .

Table 8. Comparisons of AugGKG, YAGO, GeoKG and GEKG in terms of completeness, efficiency and uniformity of accuracy.

As shown in , YAGO and GeoKG can only perform simple semantic queries and partial regular spatio-temporal queries, while GeoKG also faces the problem of high redundancy. The GEKG improved the temporal range query performance, but it still cannot answer complex spatio-temporal deductive questions. At the same time, the spatio-temporal query efficiencies of the ordinary knowledge graphs, which are determined by the resolution of the spatio-temporal data, are all lower than that of the AugGKG. Specifically, if heterogeneous data from multiple sources are heavily involved, nonuniform accuracies in traditional graphs inevitably occur.

Compared with other models, the AugGKG improves the knowledge representation model through subdivision grids, which has the obvious advantages of completeness and efficiency in answering spatio-temporal questions, especially spatio-temporal range queries and complex spatio-temporal deductive queries. The AugGKG also has the flexibility to change the grid subdivision level according to the task requirements, thereby ensuring a unified resolution of the spatio-temporal data, which also reduces the difficulty of the calculations.

6. Conclusions

Existing geographic knowledge graphs face difficulties in the unified modelling of heterogeneous spatio-temporal data, answering spatio-temporal questions for dynamic multiobjectives, and improving graph query efficiency. Accordingly, the grid-augmented geographic knowledge graph (AugGKG) is proposed in this paper. Based on the GeoSOT grid and time slice subgraph architecture, AugGKG discretely normalizes the spatio-temporal data of the graph, and the spatio-temporal data are all encoded with GeoSOT grids, which form an effective and efficient basis for spatio-temporal knowledge query computation.

In addition to performing simple semantic queries and regular spatio-temporal queries, with the adoption of the geo-hidden layer of the graph, the AugGKG can perform complex spatio-temporal deductional queries such as dynamic multiobjective spatio-temporal questions. AugGKG can calculate and complete the implicit spatio-temporal relations in the knowledge graph through grid coding algebra and can obtain active geo-parameters in the hidden layer, thus improving the query efficiency of similar questions. A comparative query experiment was also conducted on the AugGKG and existing geographic knowledge graphs (YAGO, GeoKG, and GEKG). The experiment on three knowledge query types indicates that the AugGKG is more accuracy-unified, more complete, and more efficient than other knowledge graphs, finally addressing three difficulties that other models could not solve. Thus, AugGKG is expected to be regarded as an innovative and robust geographic knowledge graph.

Certainly, AugGKG still has some limitations; for example, the hierarchy of time slice subdivision needs to be further refined, and the possible increase in the number of triplets should be considered. In future studies, we will improve the AugGKG by building an embedding concatenation network. AugGKG can jointly learn spatio-temporal grid information and graph knowledge information to predict and reason about the trends and locations of spatio-temporal entities, finally conducting intention reasoning at the system level.

Supplemental material

Supplemental Material

Download MS Word (171.1 KB)

Acknowledgments

The authors thank the editors and anonymous reviewers for their insightful comments and constructive suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

We’ve uploaded our datasets and codes to the following link: https://doi.org/10.18170/DVN/YPNYDY.

Additional information

Funding

This work was supported by the Songshan Laboratory Projects [grant number 221100211000-03], Excellent Youth Fund of Natural Science Foundation of Henan Province [grant number 212300410096], National Natural Science Foundation of China [grant numbers 62076249, 42201513], National Defense Basic Scientific Research Program of China [grant number 2022-JCJQ-JJ-0287], and Natural Science Foundation of Shandong Province [grant number ZR202209130044].

References

  • Alam, Md Mahbub, Luis Torgo, and Albert Bifet. 2022. “A Survey on Spatio-Temporal Data Analytics Systems.” ACM Computing Surveys 54 (10s): 1–38. https://doi.org/10.1145/3507904.
  • Bai, Luyi, Wenting Yu, Die Chai, Wenjun Zhao, and Mingzhuo Chen. 2023. “Temporal Knowledge Graphs Reasoning with Iterative Guidance by Temporal Logical Rules.” Information Sciences 621: 22–35. https://doi.org/10.1016/j.ins.2022.11.096.
  • Bao, Yi, Zhou Huang, Xuri Gong, Yuyang Zhang, Ganmin Yin, and Han Wang. 2023. “Optimizing Segmented Trajectory Data Storage with HBase for Improved Spatio-Temporal Query Efficiency.” International Journal of Digital Earth 16 (1): 1124–1143. https://doi.org/10.1080/17538947.2023.2192979.
  • Chen, Ziyang, Xiang Zhao, Jinzhi Liao, Xinyi Li, and Evangelos Kanoulas. 2022. “Temporal Knowledge Graph Question Answering Via Subgraph Reasoning.” Knowledge-Based Systems 251: 109134. https://doi.org/10.1016/j.knosys.2022.109134.
  • Cheng, Chengqi, Xiaochong Tong, Bo Chen, and Weixin Zhai. 2016. “A Subdivision Method to Unify the Existing Latitude and Longitude Grids.” ISPRS International Journal of Geo-Information 5 (9): 161. https://doi.org/10.3390/ijgi5090161.
  • Ding, Yulin, Zhaowen Xu, Qing Zhu, Hankan Li, Yan Luo, Ying Bao, Lingjun Tang, and Sen Zeng. 2022. “Integrated Data-Model-Knowledge Representation for Natural Resource Entities.” International Journal of Digital Earth 15 (1): 653–678. https://doi.org/10.1080/17538947.2022.2047802.
  • Du, Jiaxin, Shaohua Wang, Xinyue Ye, Diana S. Sinton, and Karen Kemp. 2021. “GIS-KG: Building a Large-Scale Hierarchical Knowledge Graph for Geographic Information Science.” International Journal of Geographical Information Science 36 (5):873–897. https://doi.org/10.1080/13658816.2021.2005795.
  • Ge, Xingtong, Yi Yang, Jiahui Chen, Weichao Li, Zhisheng Huang, Wenyue Zhang, and Ling Peng. 2022. “Disaster Prediction Knowledge Graph Based on Multi-Source Spatio-Temporal Information.” Remote Sensing 14 (5): 1214. https://doi.org/10.3390/rs14051214.
  • Hamzei, Ehsan, Stephan Winter, and Martin Tomko. 2021. “Templates of Generic Geographic Information for Answering Where-Questions.” International Journal of Geographical Information Science 36 (1): 188–214. https://doi.org/10.1080/13658816.2020.1869977.
  • Han, Bing, Tengteng Qu, Zili Huang, Qiangyu Wang, and Xinlong Pan. 2021. “Emergency Airport Site Selection Using Global Subdivision Grids.” Big Earth Data 6 (3): 276–293. https://doi.org/10.1080/20964471.2021.1996866.
  • Han, Bing, Tengteng Qu, Xiaochong Tong, Jie Jiang, Sisi Zlatanova, Haipeng Wang, and Chengqi Cheng. 2022. “Grid-optimized UAV Indoor Path Planning Algorithms in a Complex Environment.” International Journal of Applied Earth Observation and Geoinformation 111: 102857. https://doi.org/10.1016/j.jag.2022.102857.
  • He, S., L. Chu, and X. Li. 2017. “Spatial Query Processing for Location Based Application on Hbase.” Paper Presented at the 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), 10-12 March 2017.
  • Hoffart, Johannes, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. “YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia.” Artificial Intelligence 194: 28–61. https://doi.org/10.1016/j.artint.2012.06.001.
  • Hou, Kaihua, Chengqi Cheng, Bo Chen, Chi Zhang, Liesong He, Li Meng, and Shuang Li. 2021. “A Set of Integral Grid-Coding Algebraic Operations Based on GeoSOT-3D.” ISPRS International Journal of Geo-Information 10 (7): 489. https://doi.org/10.3390/ijgi10070489.
  • Hu, Yingjie. 2018. “Geospatial Semantics.” In Comprehensive Geographic Information Systems, edited by Bo Huang, 80–94. Knoxville: Elsevier. ISBN 9780128047934. https://doi.org/10.1016/B978-0-12-409548-9.09597-X.
  • Huang, Yi, May Yuan, Yehua Sheng, Xiangqiang Min, and Yuwei Cao. 2019. “Using Geographic Ontologies and Geo-Characterization to Represent Geographic Scenarios.” ISPRS International Journal of Geo-Information 8 (12): 566. https://doi.org/10.3390/ijgi8120566.
  • Janowicz, Krzysztof, Song Gao, Grant McKenzie, Yingjie Hu, and Budhendra Bhaduri. 2020. “GeoAI: Spatially Explicit Artificial Intelligence Techniques for Geographic Knowledge Discovery and Beyond.” International Journal of Geographical Information Science 34 (4): 625–636. https://doi.org/10.1080/13658816.2019.1684500.
  • Janowicz, Krzysztof, Pascal Hitzler, Wenwen Li, Dean Rehberger, Mark Schildhauer, Rui Zhu, Cogan Shimizu, et al. 2022. “Know, Know Where, KnowWhereGraph: A Densely Connected, Cross-Domain Knowledge Graph and geo-Enrichment Service Stack for Applications in Environmental Intelligence.” AI Magazine 43 (1): 30–39. doi:https://doi.org/10.1002/aaai.12043.
  • Jiang, Bingchuan, Gang Wan, Jian Xu, Feng Li, and Huiqi Wen. 2018. “Geographic Knowledge Graph Building Extracted from Multi-Sourced Heterogeneous Data.” Acta Geodaetica et Cartographica 47 (8): 1051–1061. https://doi.org/10.11947/j.AGCS.2018.20180113.
  • Kokla, Margarita, and Eric Guilbert. 2020. “A Review of Geospatial Semantic Information Modeling and Elicitation Approaches.” ISPRS International Journal of Geo-Information 9 (3), 146. https://doi.org/10.3390/ijgi9030146.
  • Li, Jing, Haiyan Liu, Jia Li, Xiaohui Chen, and Zekun Tao. 2023. “A Knowledge-Based Approach for Estimating the Distribution of Urban Mixed Land Use.” International Journal of Digital Earth 16 (1): 965–987. https://doi.org/10.1080/17538947.2023.2184512.
  • Li, Mingke, Heather McGrath, and Emmanuel Stefanakis. 2021. “Integration of Heterogeneous Terrain Data Into Discrete Global Grid Systems.” Cartography and Geographic Information Science 48 (6): 546–564. https://doi.org/10.1080/15230406.2021.1966648.
  • Li, Shuang, Guoliang Pu, Chengqi Cheng, and Bo Chen. 2019. “Method for Managing and Querying geo-Spatial Data Using a Grid-Code-Array Spatial Index.” Earth Science Informatics 12 (2): 173–181. https://doi.org/10.1007/s12145-018-0362-6.
  • Ma, Xiaogang. 2022. “Knowledge Graph Construction and Application in Geosciences: A Review.” Computers & Geosciences 161: 105082. https://doi.org/10.1016/j.cageo.2022.105082.
  • Mai, Gengchen, Krzysztof Janowicz, Rui Zhu, Ling Cai, and Ni Lao. 2021. “Geographic Question Answering: Challenges, Uniqueness, Classification, and Future Directions.” AGILE: GIScience Series 2: 1–21. https://doi.org/10.5194/agile-giss-2-8-2021.
  • Qian, Chunyao, Chao Yi, Chengqi Cheng, Guoliang Pu, Xiaofeng Wei, and Huangchuang Zhang. 2019. “GeoSOT-Based Spatiotemporal Index of Massive Trajectory Data.” ISPRS International Journal of Geo-Information 8 (6), https://doi.org/10.3390/ijgi8060284.
  • Qu, Tengteng, Lizhe Wang, Jian Yu, Jining Yan, Guilin Xu, Meng Li, Chengqi Cheng, Kaihua Hou, and Bo Chen. 2020. “STGI : a Spatio-Temporal Grid Index Model for Marine Big Data.” Big Earth Data 4 (4): 435–450. https://doi.org/10.1080/20964471.2020.1844933.
  • Shen, Tong, Fu Zhang, and Jingwei Cheng. 2022. “A Comprehensive Overview of Knowledge Graph Completion.” Knowledge-Based Systems 255: 109597. https://doi.org/10.1016/j.knosys.2022.109597.
  • Sun, Yuhan, and Mohamed Sarwat. 2018. “A Generic Database Indexing Framework for Large-Scale Geographic Knowledge Graphs.” Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 289–298.
  • Tempelmeier, Nicolas, and Elena Demidova. 2021. “Linking OpenStreetMap with Knowledge Graphs — Link Discovery for Schema-Agnostic Volunteered Geographic Information.” Future Generation Computer Systems 116: 349–364. https://doi.org/10.1016/j.future.2020.11.003.
  • Wang, Peifeng, Jialong Han, Chenliang Li, and Rong Pan. 2018. “Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding.” Proceedings of the AAAI Conference on Artificial Intelligence 33 (01): 7152–7159. https://doi.org/10.1609/aaai.v33i01.33017152.
  • Wang, Shu, Xueying Zhang, Peng Ye, Mi Du, Yanxu Lu, and Haonan Xue. 2019. “Geographic Knowledge Graph (GeoKG): A Formalized Geographic Knowledge Representation.” ISPRS International Journal of Geo-Information 8 (4): 184. https://doi.org/10.3390/ijgi8040184.
  • Wiemann, Stefan, and Lars Bernard. 2015. “Spatial Data Fusion in Spatial Data Infrastructures Using Linked Data.” International Journal of Geographical Information Science 30 (4): 613–636. https://doi.org/10.1080/13658816.2015.1084420.
  • Yan, Bo. 2019. “Geographic Knowledge Graph Summarization.” Ph.D. diss., University of California, Santa Barbara.
  • Yin, Chuan, Binyu Zhang, Wanzeng Liu, Mingyi Du, Nana Luo, Xi Zhai, and Tu Ba. 2022. “Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis.” ISPRS International Journal of Geo-Information 11 (7): 360. https://doi.org/10.3390/ijgi11070360.
  • Zhang, Yunhao, Jun Zhu, Qing Zhu, Yakun Xie, Weilian Li, Lin Fu, Junxiao Zhang, and Jianmei Tan. 2020. “The Construction of Personalized Virtual Landslide Disaster Environments Based on Knowledge Graphs and Deep Neural Networks.” International Journal of Digital Earth 13 (12): 1637–1655. https://doi.org/10.1080/17538947.2020.1773950.
  • Zheng, Kun, Ming Hui Xie, Jin Biao Zhang, Juan Xie, and Shu Hao Xia. 2021. “A Knowledge Representation Model Based on the Geographic Spatiotemporal Process.” International Journal of Geographical Information Science 36 (4):674–691. https://doi.org/10.1080/13658816.2021.1962527
  • Zhou, Chenghu, Hua Wang, Chengshan Wang, Zengqian Hou, Zhiming Zheng, Shuzhong Shen, Qiuming Cheng, et al. 2021. “Geoscience Knowledge Graph in the Big Data Era.” Science China Earth Sciences 64 (7): 1105–1114. https://doi.org/10.1007/s11430-020-9750-4.
  • Zuheros, Cristina, Siham Tabik, Ana Valdivia, Eugenio Martínez-Cámara, and Francisco Herrera. 2019. “Deep Recurrent Neural Network for Geographical Entities Disambiguation on Social Media Data.” Knowledge-Based Systems 173: 117–127. https://doi.org/10.1016/j.knosys.2019.02.030.