472
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Measuring the similarity between shapes of buildings using graph edit distance

ORCID Icon, , ORCID Icon, &
Article: 2310749 | Received 17 Oct 2023, Accepted 22 Jan 2024, Published online: 01 Feb 2024

ABSTRACT

Building data play a crucial role in geographic information science, and building shape similarity can be used for spatial inquiry, cartographic generalization, map updating, data quality assessment, and other spatial data mining applications. However, conventional shape representation and measurement methods use geometric and statistical measures and may not effectively capture the subtle differences among building shapes. Geospatial artificial intelligence methods often overlook nuanced differences in building contours, posing a challenge when precise fine-grained measurements are required. Furthermore, they have low interpretability. This study proposes a graph edit distance-based method for measuring the similarity between building shapes. It uses geometrics characteristics to construct a graph, matches and associates similar regional features between two building contours, and defines edit costs based on geometric and positional differences that align with the Gestalt principle of visual similarity. It converts the optimal edit path problem to the maximum weighted cluster solution problem and utilizes a heuristic approach to achieve the exact optimal solution within a moderately suitable timeframe. The experimental results demonstrated that the proposed method identifies detailed differences between building contours, providing accurate similarity measurement with high interpretability.

1. Introduction

Similarity measurement is a central construct in human cognition (Medin, Goldstone, and Gentner Citation1993). Humans use similarity to store and retrieve information and compare different entities. With rapid advancements in spatial information data acquisition technologies, similarity measurement techniques are being widely used in geographic information systems (GIS), especially for data retrieval (Ai et al. Citation2013), cartographic generalization (Yan Citation2015; Yan, Ai, and Zhang Citation2017), and map updating (Yang et al. Citation2018).

Building data play a crucial role in spatial analyses (Biljecki et al. Citation2021). Measuring and analyzing buildings provide theoretical foundations and guidance on urban development planning (Rehan Citation2014), statistical prediction of anthropogenic activities (Ma et al. Citation2019), map navigation (Zhang et al. Citation2013), and other spatial data mining applications (Du, Shu, and Feng Citation2016; He et al. Citation2020). Compared with natural objects, building possess distinct visual characteristics (such as orthogonality, symmetry, regionality, and pattern). Thus, shape similarity is more commonly used for matching, retrieving, and updating building data instead of semantic (Schwering Citation2008; Ballatore, Wilson, and Bertolotto Citation2013) or topological similarity, (Chen, Xu, and Wu Citation2015).

Conventional shape representation methods primarily use region- or contour-based approaches to describe shapes. However, these methods, which use geometric and statistical measures, may not effectively capture subtle differences among different building shapes. In recent years, emerging geospatial artificial intelligence (GeoAI) methods, particularly graphs neural networks (GNN), have substantially improved both accuracy and efficiency of algorithm in shape coding. However, these methods may not consider detailed contour information in complex building contours, leading to limitations in tasks that require fine-grained similarity measurements, such as map quality evaluation (Fan et al. Citation2014; Xu et al. Citation2017a). In additionally, measuring shape similarity using GNN has low interpretability.

This study proposes a novel method for measuring shape similarity based on the graph edit distance (GED) method, which provides accurate and highly interpretable measurements of shape similarity. The proposed method transforms a building contour into a graph model and defines the edit costs among primitives based on geometric and positional differences. It simplifies optimal edit path determination and uses a heuristic approach to achieve an exact optimal solution instead of an approximate one within a moderately suitable timeframe, ensuring that the edit costs of the optimal edit path accurately describe the differences between two shapes of different objects, which can further reflect shape similarity. During the determination of the optimal edit path, the algorithm associates primitives with visual similarities in the two building contours, which aligns with the Gestalt principle of visual similarity (Yu et al. Citation2014; Cetinkaya, Basaraner, and Burghardt Citation2015; Xu et al. Citation2017b). The edit costs between two matched primitives intuitively reflect their visual differences, thus demonstrating the high interpretability of the proposed method.

The remaining study can be categorized as follows. Section 2 briefly reviews studies on shape representation and similarity measurements. Section 3 elaborates on the proposed methods. Section 4 presents the experiments and compares the proposed method with other popular methods. Section 5 analyzes the interpretability of the proposed method, addresses its current limitations, and provides a potential improvement in its approach. Finally, Section 6 concludes the paper with a with suggestions for future research.

2. Related work

2.1. Conventional building shape representation and similarity measurement methods

Conventional techniques for shape representation can be categorized into two groups: contour- and region-based methods (Zhang and Lu Citation2004). This classification is based on whether the shape attributes are extracted from contours or the entire shape region.

Region-based methods use all pixels or grid within a shape region to obtain shape representation, such as describe shapes (Teague Citation1980), shape matrices (Goshtasby Citation1985), convex hulls (Hu Citation1962), media axes (Pavlidis Citation1980), reversed sketches (Huang et al. Citation2018), and the multiscale features & grid context descriptors (Fan, Zhao, and Li Citation2021). However, these representation methods encounter frequent challenges during building shape measurements because humans assess the similarity among building shapes primarily by their contour features rather than interior features (Zhang and Lu Citation2004). Furthermore, building shapes are commonly stored as vector data in maps and align easily using contour-based representation methods.

Contour-based techniques can be further categorized into global and structural approaches. Global contour techniques typically compute multidimensional numeric feature vectors using shape boundary information, such as shape signatures (Zhang and Lu Citation2002), boundary moments (Sonka, Hlavac, and Boyle Citation1993; Fu et al. Citation2018), Fourier descriptor (Ai et al. Citation2013; Chen, Feng, and Wu Citation2015) and triangle area representation (Yang, Wei, and Yu Citation2018). Matching among different shapes is a straightforward process. It is usually performed to measure distance using a metric distance, such as Euclidean (Danielsson Citation1980) or Hausdorff (Huttenlocher, Klanderman, and Rucklidge Citation1993) distance. Structural contour techniques break shapes into boundary segments, which have different procedures for selecting and organizing primitives to describe shapes among different priorities. This design performs partial matching without an exact difference measurement. Examples of these techniques include the chain-code histogram (Iivarinen and Visa Citation1996), chain of vectors (Mehrotra and Gary Citation1995), turning function (Arkin et al. Citation1991), curvature decomposition and curve fitting (Pavlidis Citation2012).

Generally, both global and structural methods are useful in partial inquiry or matching and can be combined to improve performance (Chen et al. Citation2021; Xu et al. Citation2021). However, both methods have some limitations. Global approaches typically omit detailed information. For example, a compact description is a prior option when using Fourier method, but this description provides inaccurate detailed information, especially when describing the corner points of building contours, which are vital in visual cognition. Therefore, they must be combined with other descriptors to create practical shape descriptors for obtaining detailed information (Chen et al. Citation2019; Xu et al. Citation2017b). In contrast, structural approaches analyze detailed information and enable partial matching between primitives of shapes, but their computational process is more complex because of the intricate matching in the calculation process. Most structural approaches use sub-graph matching to find an optimal solution, which is often impractical because of its high computational cost. Therefore, most methods seek an approximate solution instead of an exact solution to obtain high efficiency, which may not always guarantee the best match.

2.2. Building shape similarity measurement using GeoAI-based methods

Deep learning technology has opened new possibilities for studying GeoAI-based methods. In geoscience, GeoAI-based methods are being used to extract geographical features and recognize real geographical entities (Steiniger et al. Citation2008; García Balboa and Ariza López Citation2008; Ma et al. Citation2016; Touya and Dumont Citation2017; Sester, Feng, and Thiemann Citation2018; Courtial et al. Citation2020; Du et al. Citation2022a). Studies have demonstrated the potential of GNNs in shape representation and similarity measurement (Yan and Yang Citation2022; Yu et al. Citation2023), shape simplification (Liu et al. Citation2021; Du et al. Citation2022b; Zhou, Fu, and Weibel Citation2023), geographic entities selection (Zheng et al. Citation2021) and patterns classification (Yan et al. Citation2019; Hu et al. Citation2022; Li, Yan, and Lu Citation2023a). The maturity of the GNN technology has bridged the gap between deep learning and vector data and forged a new path for shape representation and similarity measurement.

However, two potential issues need to be addressed in current GeoAI-based methods. First, representing shape from the vector data to graphs should be careful and delicate because these predefined features primarily determine the quality of the results. In most studies on building shape coding using GNN (Yan et al. Citation2021; Liu et al. Citation2021; Hu et al. Citation2022), fixed number of vertices are selected along the boundary, which is slightly hasty and may cause the loss of delicate information regarding the building contour nodes. These limitations maybe owing to the nascent stage of early GNN technologies and the lack of guidance from classical methods that rely on graph model representations. Second, the GNN has a low interpretability during the extraction of shape features, particularly during the graph spectral convolution approach (Shuman, Ricaud, and Vandergheynst Citation2016), which is not conducive to identifying potential problems in the network and makes improving the robustness of the model difficult.

3. Method

This study proposes a graph-based shape representation method that describes building shapes by extracting the regional characteristics of building contours. It breaks the shape boundary into segments as vertices of a graph to construct primitives using regional geometry characteristics. This method defines the edit costs between primitives of two building shapes based on geometric and positional differences and uses a heuristic approach to obtain the optimal edit path that minimizes the total edit cost. This cost represents the difference between two building shapes and can reflect shape similarity. The smaller the shape difference, the more similar the shapes are.

3.1. Similarity measurement between graphs using GED

GED is a powerful method for graph matching and similarity measurement (Bunke and Allermann Citation1983; Sanfeliu and Fu Citation1983). Owing to its exceptional error-tolerance performance and applicability to a wide class of graphs, it is one of the most mainstream graph similarity comparison methods that is being widely used in pattern analysis and recognition (Gao et al. Citation2010), such as biometric identification (Lajevardi et al. Citation2013), cancerous tissue detection (Ozdemir and Gunduz-Demir Citation2013), handwriting recognition (Fischer et al. Citation2013), scenario retrieval (Wessel et al. Citation2011) and and a few applications in the geosciences field, primarily for road network (Liu et al. Citation2015) and trajectory matching (Koide, Xiao, and Ishikawa Citation2020).

GED is an extension of the well-known string edit distance (Wagner and Fischer Citation1974) to the domain of graphs. It uses a set of graph-editing operations to transform one graph into another. The cost of each operation is determined by a cost function, which depends on the current edit operation. The total cost of the set represents the distance between the two graphs. The larger the distance, the greater the difference and the lower the similarity.

Given a graph G=(V,E,LV,LE), where V is a set of vertices and E is a set of edges such as EV×V. LV and LE are label sets of graph vertices and edges, respectively, recording the categories or attributes of vertices and edges. There are three basic operations for graph editing: insertion, deletion, and substitution for vertices and edges. Let u and v be two different vertices. If ε indicates no vertex; then, insert vertex u by εu, delete vertex u by uε, and substitute vertex v for u by uv. The other three editing operations for the edges were similar to those for the vertices. illustrates editing operations between two graphs that have two different type vertices and edges.

Figure 1. An explanation of graph edit operations (from G1 to G2).

Figure 1. An explanation of graph edit operations (from G1 to G2).

Given graph G1=(V1,E1,LV1,LE1) and graph G2=(V2,E2,LV2,LE2), the ordered set of editing operations from G1 to G2 comprises a complete editing path λ=(e1,e2,,ei,en), where ei denotes the ith editing operation, and n is the total number of operations. Let the editing cost of the edit operation ei be c(ei); then, the GED of the path λ is: (1) Distance(G1,G2)λ=eiλc(ei)(1) There are multiple edit pathsλγ between (G1,G2), and the optimal edit path is the one with the smallest total edit cost, representing the distance between the two graphs. The shorter the distance, the more similar the two graphs are. The edit distance Distance(G1,G2) is: (2) Distance(G1,G2)=minλγDistance(G1,G2)λ(2)

3.2. Conversion of shapes into graphs

Building contours comprise points and lines, which makes them ideal for natural conversion into graphs with the nodes and connections representing vertices and edges, respectively. A cyclic graph is constructed by converting corner points and boundary lines into vertices and edges, respectively. A cyclic graph can be considered as a chain starting with any vertex and ending with cohesively connected vertices. It does not contain tree-like subgraphs, which makes it easy to perform the edit operation and discover the optimal edit path.

The primary objective during graph construction is to extract the geometric characteristics for shape representation and similarity measurement. We use the angle of the corner points to convey local information. Based on the Gestalt principles, we use the angle of the corner point and the length of two parts of building contour which connect two adjacent corner points to represent regional information for structural cognition. Both local and regional information were used for labeling the vertices in the graph.

To accurately and unambiguously represent the shape and avoid ambiguity caused by the use of the intersection angle, the rotation angle was chosen to represent the local information of the corner points (). Then label L(v) of vertex v of the graph of a building is. (3) L(v)=[R(v),lprev(v),lnext(v)](3) where lprev(v) is the length of the boundary between current vertex v and the previous vertex (clockwise direction) and lnext(v) is the length of the boundary between current vertex v and the next vertex. R(v)[0,2π) is the angle of rotation between the two vectors corresponding to the two boundaries of the building contour. L(v) represents the structural cognitive information of the building corner point corresponding to vertex v in graph G.

Figure 2. Local information expression of the corner points.

Figure 2. Local information expression of the corner points.

As the local and regional information of the building corner points is stored in vertex labels, the edges in the graph represent only the relationships between adjacent vertices. An unweighted cyclic graph was constructed using an arbitrary starting vertex ().

Figure 3. Conversion of building corner points shapes into shape graphs (v0is an arbitrary start vertex of the graph).

Figure 3. Conversion of building corner points shapes into shape graphs (v0is an arbitrary start vertex of the graph).

3.3. Calculation of building shapes similarity

There are two key issues during the calculation of shape similarity using GED: defining the costs of different editing operations and searching for the optimal editing path.

Considering that the regional characteristics of a building are stored in the vertex labels of a graph, this study only edited the graph vertices. As vertex insertion is a dual-to-vertex deletion, this study defined only two types of edit costs: vertex substitution and deletion. Assuming that the process of editing G1 to G2 comprises the following steps: ‘delete vertices from G1,’ ‘substitute vertices from G1 with vertices from G2,’ and ‘insert vertices from G2,’ the editing cost is equivalent to ‘delete vertices from G1,’ ‘delete vertices from G2,’ and ‘substitute vertices from G1 with vertices from G2 (or vice versa).’

Unlike other structure-based similarity measurement methods that seek approximate solutions, we aimed for an exact solution to determine the optimal editing path and the corresponding editing cost. Consequently, we transformed the problem into a maximum-weight clique problem by constructing a vertex association graph to obtain a set of vertex pairs for G1 and G2where the sum of the costs of vertex substitution was minimized. This study used an improved Local Strong Configuration Checking (LSCC) model to obtain an exact solution.

illustrates the basic framework of the proposed method. The definition of the vertex edit cost and the method for obtaining the optimal editing path are elaborated in the subsequent sections.

Figure 4. Framework of the GED method. (a) illustrates a set of examples of original and target graphs; (b) represents multiple graph edit paths and their corresponding edit costs based on edit cost definition; (c) shows the construction of a vertex association graph using the above information, and the determination of its maximum-weight clique (a subgraph where each vertex is adjacent) through LSCC; and (d) illustrates the vertex editing scenario represented by the maximum-weight clique (optimal edit path), and the corresponding graph edit distance.

Figure 4. Framework of the GED method. (a) illustrates a set of examples of original and target graphs; (b) represents multiple graph edit paths and their corresponding edit costs based on edit cost definition; (c) shows the construction of a vertex association graph using the above information, and the determination of its maximum-weight clique (a subgraph where each vertex is adjacent) through LSCC; and (d) illustrates the vertex editing scenario represented by the maximum-weight clique (optimal edit path), and the corresponding graph edit distance.

3.3.1. Definition of vertex edit cost

The cost function of the edit operation differs according to the discipline, objective, and graph. The rationality and adequacy of the definition of the cost function determine the effectiveness of the editing distance. Assuming that one building contour can be generated by stretching another, the geometric distortion during stretching represents the difference between the two contours.

In this study, we defined two edit operations: vertex deletion and vertex substitution (S). All graph vertices were categorized into three sets in an edit path (from G1 to G2 or G2 to G1): two sets of vertices to be deleted (D1 and D2) and the set of vertex pairs to be substituted (SV1×V2). Let du and dv be the costs of deleting vertices u and v from G1 and G2, respectively. Let suv be the cost of substituting vertex u with vertex v (label substitution). Then, the optimal editing distance can be rewritten as follows: (4) Distance(G1,G2)=min(uD1du+vD2dv+(u,v)Ssuv)(4) The ‘stretching operation-based definition’ does not imply that stretching operation will occur in the GED calculation. Instead, this approach is used to assign label values to each vertex. The primary reason for defining the edit cost is to guarantee alternately appropriate costs that can help the GED algorithm make wise decisions. When searching for the optimal edit path, the algorithm tends to substitute similar vertices, instead of deleting them.

3.3.1.1. Vertex substitution operation cost

During vertex substitution, the geometric differences between the two vertices in the graph, which are represented by the vertex labels and the positional differences, should be considered. As we used an arbitrary vertex as the starting point during the graph construction process, we represented the graph as a set of chains to ensure that the replacement process did not fall into an infinite loop. For graph G with n vertices, we constructed chains in both the clockwise and counterclockwise directions considering each vertex as the starting point. The corresponding set of chains is {C}={C1,C2,,C2n}. Substitution operation cost s(u,v) for replacing vertex u in G1 with vertex v in G2 depends on geometric GeomDis(u|G1,v|G2) and positional PosDis(u|C1i,v|C2j) differences of the vertex, where i and j represent the chain indices. (5) s(u,v)=GeomDis(u|G1,v|G2)×PosDis(u|C1i,v|C2j)(5) Geometric difference GeomDis(u|G1,v|G2) represents the difference between two triangles, each formed by the information covered by the labels of vertices u and v. GeomDis(u|G1,v|G2) was calculated as follow: (6) GeomDis(u|G1,v|G2)=ScDis(u|G1,v|G2)×[StrDis(u|G1,v|G2)+AngDis(u|G1,v|G2)](6) where ScDis(u|G1,v|G2), StrDis(u|G1,v|G2), and AngDis(u|G1,v|G2) represent the distortion among different phases. The calculation is performed in the following steps:

Let up and vq be the vertices of G1 and G2, respectively. up1 and up+1 are the two adjacent vertices of up. vq1, vq+1 are the two adjacent vertices of vq. Move up1 and vq1 to the same position, and then rotate the structure to make up1, vq1, up+1, and vq+1 collinear on datum line l. Note the triangle constructed by up1, up, up+1 as T1, and constructed by vq1, vq, vq+1 as T2 ().

Figure 5. ‘Stretching’ of the regional characteristics from G1 to G2.

Figure 5. ‘Stretching’ of the regional characteristics from G1 to G2.

Only when T1 and T2 have the same base, is it reliable to make them similar by stretching point up. Therefore, it is necessary to scale the triangles for ensuring that they share the same base. Scale up T1 to T1 till point up+1 coincides with vq+1. The distortion in this operation is noted as ScDis(u|G1,v|G2).

Considering that a triangle can be determined using one side, the height on that side, and the opposite angle of that side, this study designed two parameters, StrDis(u|G1,v|G2) and AngDis(u|G1,v|G2), to describe the distortion caused by stretching up to vq.

ScDis(u|G1,v|G2) is determined using the ratio of the base of triangle T1 and T1: (7) ScDis(u|G1,v|G2)=loge[argmax(bT1bT1,bT1bT1)]+1(7) StrDis(u|G1,v|G2) is the structure difference determined using the bases and the heights of the two triangles (Jiang et al. Citation2010; Tian et al. Citation2017). Let b to be the base of T1 and T2, and h1, h2 are the height of triangles. (8) StrDis(u|G1,v|G2)=δ|(1bh1+b)τ(1bh2+b)|(8) Where δ is the ratio of the total length of a corner point to the shape’s perimeter. τ is a parameter with value 1 or −1, which based on whether up and vq are on the same side of the baseline b. (9) δ=(lprev(v)+lnext(v))uV(lprev(v)+lnext(v))(9) (10) τ={1,ifupandvqonthesamesideofbaselineb1,otherwise(10) Let Δθ be the difference between two angles opposite to the base, then AngDis(u|G1,v|G2) is: (11) AngDis(u|G1,v|G2)=Δθπ(11) To calculate the similarity between the cycle graphs using GED, the graph was represented by a set of directed chains (explained in Section 3.3.2). Thus, every vertex in a chain had a sequence position representing the positional difference. If the two vertices have a low GeomDis(u|G1,v|G2)but an obvious positional difference, GED will set a comparatively high substation cost.

Let pos(u|C1i) and pos(v|C2j) be the relative position of vertices u and v in the chains C1i and C2j, respectively, which are obtained by adding labels lnext(u) and lnext(v) of vertices that precede the current vertex. This is the total length of the contour from the chain's starting point to the current point in the corresponding building contour, which is shown in . Then, the PosDis(u|C1i,v|C2j) is given by: (12) PosDis(u|C1i,v|C2j)=e|pos(u|C1i)pos(v|C2j)|(12)

Figure 6. An example for calculating the relative position.

Figure 6. An example for calculating the relative position.

3.3.1.2. Vertex deletion operation cost

Defining the deletion cost of a vertex is a major challenge for GED. Because substitution is more meticulous than deletion, the cost of deletion is usually set to 1, whereas the cost of substitution is relatively higher. However, this approach may not be suitable for measuring shape similarity, as removing a corner point from the contours can have a considerable impact. Despite this, the substitution cost should be higher in this study, but not too much for most pairs of vertices. For vertices u and v on G1 and G2, if their geometric or positional differences on chains C1i and C2j are too large, the cost of substituting vertices u and v may be higher than that of deleting the two vertices. In such case, the proposed method deletes both vertices rather than performing an irrational substitution operation.

Consequently, the deletion cost was designed based on the geometric distortion to extrude the corner of the shape to the baseline. Then the deletion cost for deleting vertex v is: (13) dv=δ(1bh+b)(13) where δ is the ratio parameter, b is the base of the triangle constructed using the corner and adjacent points, and h is the height of the base.

3.3.2. Search for the optimal edit path

This study aimed to determine the optimal edit path that represents the distance between the two graphs. According to EquationEquation (4), the set of vertices to be substituted represents an edit path. Then, the EquationEquation (4) can be rewritten as. (14) Distance(G1,G2)=min[uD1du+vD2dv+(u,v)S(du+dvsuv)]=uV1du+vV2dvmax[(u,v)S(du+dvsuv)]=uV1du+vV2dvmax(u,v)S[μ(S)](14) Because uV1du and vV2dv are constant and independent of λ, the optimal edit distance between the two graphs depends on the maximum value of the utility μ(S)=(u,v)S(du+dvsuv). Thus, the optimal edit path is determined when the optimal substitution set (u,v)S is calculated.

Furthermore, determining the minimum edit distance between graphs can be transformed into searching for the maximal weighted clique of the association graph (Torsello and Hancock Citation2004; Torsello, Robles-Kelly, and Hancock Citation2007). In this method, an association graph is constructed by obtaining both G1 and G2 with vertices deletion and substitution operations. A clique of the association graph can represent a substitution set (u,v)S, so the edit-distance problem is equivalent to searching for a clique of an association graph that maximizes the total utility μ(S).

Hence, the optimal edit path is obtained in two steps: constructing an association graph and determining the maximal weighted clique, which are explained as follows.

3.3.2.1. Construction of an association graph

For an association graph, the maximal weighted clique of it represents the optimal substitution set S that determines the minimum edit distance of a pair of chains. Let {C1} and {C2} are the chain sets of G1 and G2 respectively. i and j represent the chain indices. The optimal edit path can be determined by traversing combinations of chains in the two sets with the minimum value. (15) Distance(G1,G2)=minC1i{C1},C2j{C2}Distance(C1i,C2j)(15) A transitive closure should be constructed before constructing an association graph. For directed chain C=(V,EC), transitive closure Ω(C)=(V,E^C) has the same vertex V and directed edges. In a transitive closure, all vertices are connected to the remaining vertices; therefore, the deletion of any vertex will not break the connectivity. When deletion occurs, the vertex and adjacent edge are deleted in both the chain and the associated transitive closure ().

Figure 7. (a) Conversion of a graph into a transitive closure where all the vertices are connected to the others. (b) When deletion is performed on the transitive closure, the connectivity of the original graph still exists.

Figure 7. (a) Conversion of a graph into a transitive closure where all the vertices are connected to the others. (b) When deletion is performed on the transitive closure, the connectivity of the original graph still exists.

Let Ω(C1i)=(V1,E^C1i) and Ω(C2j)=(V2,E^C2j) be two transitive closures of C1i and C2j. GA=(VA,EA,W) is the association graph of Ω(C1i) and Ω(C2j), which has vertices set VA=V1×V2 equivalent to the Cartesian products of vertices of Ω(C1i) and Ω(C2j). In the association graph GA, the vertex means a pair of vertices in Ω(C1i) and Ω(C2j) operating the substitution, and wWis the weight of the vertex: (16) w(u,v)=du+dvsuv(16) The edges represent the mutual constraints between pairs of vertices. Specifically, let (ui,vj) and (up,vq) be the two vertices in GA; then, there is a directed edge ((ui,vj),(up,vq)) in EA only when there are edges (ui,up) and (vj,vq) in E^C1i and E^C2j, respectively. For better demonstration, arrange the vertices in GA as a matrix (), in which directed edges only exist between one vertex and the other vertices on the lower right side.

Figure 8. An example of the structure of an association graph. The dashed arrows and solid arrows indicate a possible edit path and directed edges in the graph, respectively. (a) is an association graph GA of Ω(C1i) and Ω(C2j) in a matrix-like expression. (b) illustrates the directed edges between vertices in GA. Directed edges only exist between one vertex and other vertices on its lower right side. (c) is a clique of GA and represents an edit path. The selected vertices of GA (a clique) are substituted, whereas the left vertices in two transitive closures are deleted.

Figure 8. An example of the structure of an association graph. The dashed arrows and solid arrows indicate a possible edit path and directed edges in the graph, respectively. (a) is an association graph GA of Ω(C1i) and Ω(C2j) in a matrix-like expression. (b) illustrates the directed edges between vertices in GA. Directed edges only exist between one vertex and other vertices on its lower right side. (c) is a clique of GA and represents an edit path. The selected vertices of GA (a clique) are substituted, whereas the left vertices in two transitive closures are deleted.

In , a set of connected arrows means a clique of GA, and it also can be regarded as an edit path with substitution vertices set and deletion vertices set. When the arrow goes through the vertex (up,vq) of GA, the label of up in Ω(C1i) is updated to the label of vq after the substitution operation. If the arrow jumps over column q in the matrix, the vertex vq will be deleted from Ω(C2j).

3.3.2.2. Optimal path based on the maximum weighted clique

A clique in a graph is a complete or fully connected subgraph, and a maximum weighted clique is a clique with maximum total weight among all the cliques. When setting a weight w(u,v) to the vertices of GA, the search for the optimal edit path is transformed into obtaining the maximum weighted clique of GA.

The maximum weighted clique problem is a typical non-deterministic polynomial that requires considerable time to solve. LSCC is a heuristic algorithm (Wang, Cai, and Yin Citation2016) for determining the maximum clique by performing the following operations: initialize, add, swap, and drop. The algorithm defines the check region for the local maximum clique. When the region extends to the entire graph, it determines the maximum clique of the given graph.

Considering that association graph GA has evident structural patterns, this study modifies the LSCC algorithm to improve the efficiency of solving the maximum weighted clique problem as follows.

A matrix-like graph expression was used for an enhanced elaboration. Substitution set S0=[(u0,v0),(u1,v1),,(uk,vk)] and a corresponding clique Clq0 were initialized. Then, the total weight of clique was equivalent to utility μ(S0) as shown in EquationEquation (17). (17) W(Clq0)=μ(S0)=(u,v)S0(du+dvsuv)(17) Subsequently, the vertices were swapped as follows:

  • The vertices were swapped from bottom to top and row by row. For the kth row, the candidate set contains the vertices on the right side of the current vertex and has a larger weight. After checking the candidate set's weight and swapping in the kth row, clique Clqk and substitution set Sk were recorded as the kth local optimal solution (the local optimal solution was recorded only once for each row). Subsequently, we jumped to the (k1)th row and continued swapping until the swap operations were completed in every row. The final recorded local optimal solution represented the maximum weighted clique Clq.

  • When checking the vertices in the set, if the weight is larger than the current vertex weight and the vertex position lies to the left of the vertex in the next row (the last row is not considered), the vertex was swapped with the candidate, and the candidate set was updated.

  • If a vertex whose weight was larger than the current vertex was present, but it was located directly above or to the right of the vertex in the next row, current vertex (u,v) and total weight W were recorded. Subsequently, a new substitution set S was initialized based on the current vertex (extending to the lower right by one row and column), and the swap row was relocated to the last for swapping. When the swapping of the (u+1)th row was finished, total weight Wwas recorded. If WW, the current clique was recorded as a temporary optimal solution Stemp, and swapping was continued from (u,v). After swapping all the candidates in this row, the current clique was compared with all Stemp, and the one with the largest total weight was recorded as local optimal solution Sk.

shows an example of determining the maximum weighted clique of an association graph. Each subfigure represented substitution sets (cliques) in k-steps and the optimal solution in former steps. (a) shows the initial substitution set and the followed swap process in the last row of the matrix. (b) and (c) illustrate how a new substitution set was initialized and updated using different k values. When the start vertex was changed ( (d)), more columns representing transfer closures were added. k= 0 implied a change in the starting point of matching substitution. Additional association graphs were constructed by adding more columns. Therefore, this example illustrates association graphs constructed using a transitive closure from G1and all transitive closures from G2, with had the same direction and the maximum weight clique in these graphs. The final editing path was obtained by performing the above algorithm with backward and same transitive closures from G1and G2, respectively. Finally, the maximum weight clique was obtained, i.e. Clq=[(u0,v0),(u1,v2),(u2,v3),(u3,v4)]. The optimal edit path was obtained as λ=[(u0v0),(u1v2),(u2v3),(u3v4),(v1ε)].

Figure 9. Determination of maximum weighted clique in the matrix-like association graph.

Figure 9. Determination of maximum weighted clique in the matrix-like association graph.

4. Experiments and results

We designed three experiments to validate the proposed method. The experiments include qualitative and quantitative comparisons with traditional shape measurement methods and GeoAI-based shape measurement methods. Two different datasets were constructed to assess the cognitive ability of the proposed method, and all building contours were standardized.

4.1. Shape matching experiment

In this study, three metrics were used to quantitatively assess shape cognition ability in the shape match task: first tier (FT), second tier (ST), and discounted cumulative gain (DCG) (Shilane et al. Citation2004). The FT and the ST metrics refer to the percentage of shapes within the top K matches that belong to the same template type. For each shape class comprising N members, K=N1 and K=2×(N1) for FT and ST, respectively. In all cases, an ideal matching result lead to a score of 100%, and higher values indicated better matches. DCG is a statistic that weighs more correct matches near the front of the ranked similarity list than the latter part of the same list. For each shape, similarities to the other shapes were determined and sorted in descending order. The more advanced the sorted shape, the stronger the relevance or irrelevance. The DCG ranged from 0 to 1, with higher values indicating better performance.

The experiment dataset is a labeled building shapes dataset () obtained from Yan et al. (Citation2021). It comprised 10 typical standard shapes, including E-, T-, and U-shape. Each type comprised 500 shapes, which were used as training samples for 5,000 samples.

Table 1. Examples of 10 building shapes type.

To quantitatively assess the proposed method, this study compared it with three conventional methods: two contour-based approaches, namely the Fourier descriptor (FD) (Ai et al. Citation2013) and turning function (TF) (Arkin et al. Citation1991), and one region-based method, namely multiscale features and grid context descriptors (MF&GCD) (Fan, Zhao, and Li Citation2021). In addition, the proposed method was also compared with two novel methods: graph convolutional autoencoder (GCAE) (Yan et al. Citation2021) and LineStringNet (Li, Yan, and Lu Citation2023b). Considering the remarkable impact of the rotating operation on the results when using certain methods, we compared the performance of shape similarity measurements for both rotated (marked by the letter R) and unrotated buildings.

The proposed GED method exhibited rotational invariance and outperformed conventional methods, particularly for unrotated buildings (). Both the GED and TF methods traversed all contour points to obtain the optimal value; however, the GED method calculated the cost using a more reasonable way, which explains its superior performance. The FD method used compact descriptions to reduce the computational complexity, which rendered it insensitive to continuous characteristics, thereby resulting in inaccurate details. Although the MF&GCD method also has rotation invariance, its precision was unsatisfactory and substantially affected by the grid size. Furthermore, the region-based method did not exhibit superior performance when using vector data. indicates that the proposed GED method was comparable to the GCAE method. Although the GCAE method was sensitive to the orientation of the buildings, it outperforms the rotating building data. The LineStringNet method exhibited excellent performance in the shape matching experiment; however, its generalization ability should be further examined owing to the supervised training process used in this study.

Table 2. Performance of shape similarity measurements using different methods. Higher number indicates better performance.

The precision-recall curves of different methods for classifying the shape type by averaging the 10 standard shapes are depicted in , which aligns with the previous analysis.

Figure 10. Precision–recall curves of different methods. (a) Compares to the conventional methods. (b) Compares to the GeoAI-based methods.

Figure 10. Precision–recall curves of different methods. (a) Compares to the conventional methods. (b) Compares to the GeoAI-based methods.

Further analysis was conducted on different shape type matching using the proposed GED method. illustrates a matching result showing the matching details for the10 standard shapes. Relatively complex shapes, such as E-, H-, and Z-shapes, exhibited better performance, whereas the most incorrect matches were observed in the L-, I-, and O-shapes.

Figure 11. Image of matching result obtained using the GED method. The blue and red blocks (bottom left to top right diagonally) represent correct and incorrect matches, respectively. The deeper the color, the more matches there are between the shape pairs.

Figure 11. Image of matching result obtained using the GED method. The blue and red blocks (bottom left to top right diagonally) represent correct and incorrect matches, respectively. The deeper the color, the more matches there are between the shape pairs.

It is noteworthy that there is a significant misclassification of I-shapes as O-shapes. This may be because prior knowledge can influence sample labeling, marking shapes with an aspect ratio close to 1 as O-shapes and the rest as I-shapes. However, the GED method measures the similarity between shapes objectively without prior knowledge. Therefore, when determining the similarity between a rectangle (assuming an aspect ratio of 2:1) and the O- and I-templates (with aspect ratios of 1:1 and 5:1, respectively, in Dataset 1), the algorithm may calculate it as more similar to the O-template from an edit cost perspective. In contrast, humans would consider it to have an I-shape. For regular I-shapes, a hidden aspect ratio threshold of approximately 2.873:1 exists. When the aspect ratio exceeds this threshold, the GED considers the shape as more similar to the I-template; otherwise, it considers the shape as more similar to the O-template. This clarifies why a substantial number of buildings labeled as I-shapes were misclassified as O-shapes, whereas a few buildings labeled as O-shapes were misclassified as I-shapes.

4.2. Shape retrieval experiment

We constructed a dataset using 9410 building shapes from OpenStreetMap (). The shapes in this dataset were more intricate, and the 10 building-shape templates were retrieved from similar buildings in the dataset.

Figure 12. Experimental area for the retrieval experiment. The left picture illustrates the location of Shanghai City in mainland China and the red line depicts the study area in Shanghai City (it is located in the vicinity of the walled region defined by Zhongshan North, West Yan'an, and North Chengdu Roads). The two right pictures denote a sample block and building shapes in the area.

Figure 12. Experimental area for the retrieval experiment. The left picture illustrates the location of Shanghai City in mainland China and the red line depicts the study area in Shanghai City (it is located in the vicinity of the walled region defined by Zhongshan North, West Yan'an, and North Chengdu Roads). The two right pictures denote a sample block and building shapes in the area.

lists the templates’ top nine similar shapes and the graph edit distance of the templates. The retrieved shapes exhibited high visual similarity to the templates, and the retrieval results aligned with the intuitive notion of humans. Furthermore, the retrieved shapes had different orientations, indicating that the GED method has a high rotation invariance.

Table 3. Top nine retrieved buildings for the 10 standard shapes.

In this dataset, a large proportion of the shapes were simple, such as I-, O-, and L-shapes, resulting in the best performance of the retrieval results, with a small edit distance. However, the retrieval results of some of the more complex shapes, such as E-, F-, H-, and Y-shapes, did not align with human visual cognition. There are two possible explanations for this discrepancy. First, Dataset 2 comprised only a limited number of regularly-shape buildings, particularly for the E- F-, and Y-shapes. Consequently, the retrieved shapes presented in may not represent buildings that are similar to the templates. Second, the GED method tends to measure regional information; therefore, shapes with more similar regions obtain a relatively low score. For example, the 7th and 8th similar buildings in the Y-shape class were partially similar to the templates (in the top half ‘valley’ part). The 5th and 6th similar buildings in the H-shape class partially resembled (at the bottom part) the template (the left or right part), resulting in low edit distances.

Furthermore, some shapes were more similar to the target from human perceive but had a high GED value and rank such that the 7th and 8th H -shapes were more similar to the template than the 5th and 6th H -shapes. The primary cause may be the GED's tendency toward regional information. The bottoms of the 5th and 6th H-shapes were extremely similar to the left/right parts of the template; thus, that the corresponding vertices had low edit costs. Although the 7th and 8th H-shapes were similar to the template, more vertices had high detailed edit costs, and the total edit cost was greater.

In addition, the 7th and the 8th similar buildings in F- and E-shapes were the same building. Although this shape is more similar to the E-shape owing to a lower edit distance, which is consistent with visual judgment, it should not have been retrieved by the F-shape template. This can be explained as follows: First, the GED method overemphasizes the regional characteristics instead of global ones. The graph is a discrete structure, i.e. its shape is described by a set of regional characteristics. When the majority of regional characteristics are very similar, but a fraction is not, then the fractional part will not considerably affect the results; however, this fraction may cause a large difference in human cognition. This limitation should be further investigated in future studies. Second, dense corner points and small contour segments can affect the GED results. This point is further discussed in Section 5.

We also compared the GED method with the GCAE and LineStringNet methods, which have demonstrated good performance in shape matching experiments for building shape retrieval.

presents the top five retrieved building elements and their corresponding quantitative measurement values obtained using the three methods. GED and SD represent the output values of the GED and GCAE methods, respectively, where smaller values indicated a higher similarity. Psim is the output value of the LineStringNet model, where larger values indicated a higher similarity. The results demonstrated that the shapes retrieved using the GED method were more consistent with human cognition than those obtained using the GCAE and LineStringNet methods, particularly in the F-, T-, and Z- shape classes. This comparison indicates that the current GeoAI-based methods have some limitations in their generalization ability and require more diverse and representative samples to learn latent features for better performance in new datasets. However, obtaining sufficient samples remains a considerable challenge. Conversely, the GED method exhibited higher stability and provided more reliable candidates for shape retrieval.

Table 4. Retrieval results of templates using GCAE and LineStringNet methods.

Some shapes, such as the 1st and 3rd E-shape targets, were not retrieved using the GED method but were retrieved using alternative methods, such as LineStringNet method, as shown in . Although these shapes were similar to the human-perceived letter shapes, they differed substantially from the templates used in the experiment. For example, the 1st E-shape retrieved using LineStringNet had a thinner trunk than that of the template, and its upper-left corner was sunken. These factors increased the edit costs required to match the E-shape template when using the GED method.

An additional experiment was conducted to retrieve four additional intricate shapes from the same dataset. lists the top nine retrieved buildings and their corresponding quantitative measurements. Among the three candidate methods, shapes that were almost identical to the targets were retrieved. However, after retrieving extremely similar shapes, the subsequent retrieval sequences of the GED method comprised features that were more similar to the target than those of the LineStringNet and GCAE models. For the first target, all three candidate methods retrieved four shapes that were almost identical to the target. In addition, the GED method retrieved additional shapes with regional characteristics that were more similar to the target, exhibiting a distinct ‘step-like’ feature, which is consistent with human cognition of the retrieval targets. In the second target, the 5–9th buildings that were retrieved using the GED method displayed obvious similarity to the target, whereas the shapes retrieved using GCAE exhibited considerable differences from the target, and the H-shapes retrieved using LineStringNet differed locally from the target. In the third target, the 4–9th results retrieved using GCAE and LineStringNet were markedly different from the target, whereas the GED results showed certain regional similarities with the target. The 6th result was globally similar to the target. In the fourth target, the first seven results retrieved using the GED method visually resembled the target, whereas GCAE and LineStringNet could only retrieve three buildings that were similar to the target. Additionally, shapes that appear visually similar and dissimilar to the target exhibit a small numerical difference in the measurement values.

Table 5. Retrieval results of complicated shapes using GED, GCAE and LineStringNet methods.

4.3. Shape cognition experiment

We designed a cognition experiment to compare the human cognition results for buildings with those of the GED method in the test area (block in ).

In this experiment, the GED method was first used to query buildings based on qualitative shape similarity representation, which was defined as follows: strongly (0 ≤ GED < 0.05), moderately (0.05 ≤ GED < 0.1) similar. Subsequently, 10 graduated students majoring in GIS were asked to judge the degree of similarity. They were required to assign an order number describing the shape similarity of each building in the experimental data to the E-, H-, and Z-shape templates. The order numbers ranged from zero to two, with two, one, and zero indicating strong moderate, and no similarity, respectively.

lists the cognition experiment results for E-, H-, and Z-shape template buildings. In this study, we used a ratio model (Eisler and Ekman Citation1959; Tversky Citation1977) to measure the difference in shape recognition results between the GED method and human cognition. (18) S(a,b)=2|AB||A+B|(18) S(a,b) represents the similarity between the GED method and human cognition result. A and B represent building sets queried using GED and human cognition, respectively.

Figure 13. Qualitative shape similarity representation by GED value (strongly (0 ≤ GED < 0.05) and moderately (0.05 ≤ GED < 0.1) similar). Red rectangular boxes indicate some mismatched cognition results.

Figure 13. Qualitative shape similarity representation by GED value (strongly (0 ≤ GED < 0.05) and moderately (0.05 ≤ GED < 0.1) similar). Red rectangular boxes indicate some mismatched cognition results.

This experiment obtained a semantic similarity using two qualitative representations, which was 60.0% and 38.5% for strong and moderate similarity, respectively. Qualitative representation using the GED method was different to human cognition. There are two possible explanations for this discrepancy. First, the shape cognition task is inherently ambiguous, and there exists no clear boundaries among different qualitative representation levels, such as strongly and moderately similar. Therefore, subtle differences in shape may cause substantial differences in the qualitative representation of the GED values. For example, the two H-shape buildings on the left side had a GED value of 0.473, which was close to a predefined threshold but caused the building to be recognized as ‘moderately similar’.

Second, the thresholds for qualitative representation using the GED method may differ for different shapes. For example, the two E-shape buildings in the middle had a GED value of 0.115, which exceeded the threshold and caused the building shapes to be recognized as dissimilar. Moreover, the GED value of the two Z-shape buildings at the bottom were 0.086, which exceeded the threshold and caused the building shapes to be recognized as moderately similar.

Consequently, the results may vary if we set a more ambiguous definition of ‘similarity.’ For example, considering both ‘strongly similar’ and ‘moderately similar’ as ‘similar’ and the rest as ‘dissimilar,’ the semantic similarity between the two recognitions would be 93.33%, indicating that shape recognition using the GED method was consistent with human cognition.

5. Analysis and discussion

The proposed method can precisely identify the detailed differences among different building contours, thereby providing accurate similarity measurements with high interpretability. As mentioned in the experiments in Section 4, the GED method is superior to conventional similarity measurements, and it is expected to be applied for more downstream applications, such as cartographic generalization. This section further analyzes the high interpretability of the proposed method, addresses its current limitations, and provides potential improvements in its approach.

5.1. Analysis of the GED method

5.1.1. High interpretability of the GED method

The GED method provides an optimal edit path, which presents a local feature-matching solution based on corner points of building contours. This ensures that when two buildings are matched based on their contour corner points, the sum of the geometric differences is minimized. Such a matching solution is visually intuitive and aligns with the Gestalt principle of visual similarity, thereby rendering the proposed method highly interpretable. This section illustrates the calculation of edit costs for graph vertices and the matching solution using an example, emphasizing the high interpretability of the proposed method.

illustrates a relatively complicated test shape, labeled as Z-shape, in the Dataset 1 and its GED value in the 10 standard shape templates. The GED method correctly recognized it as a Z-shape, whereas other conventional methods classified it as either O- or H-shapes. Although it has the visual characteristics of a Z-shape, some nodes may cause the measurement methods into categorizing it as other standard shape types. Furthermore, it is a mirrored Z-shape, which has been rotated. Additionally, small noise segments are present in the shape, which may interfere with the measurement methods.

Figure 14. Contour of the test building shape and its 10 standard shape template cognition results using the proposed GED method. A low GED value indicates that the two shapes are more similar.

Figure 14. Contour of the test building shape and its 10 standard shape template cognition results using the proposed GED method. A low GED value indicates that the two shapes are more similar.

depicts the ultimate matching scenario of contour corner points between the test shape and the Z-shape template, accompanied by the associated edit costs for each editing operation. lists the geometric distortion cost between vertices, and delineates the computational process for a non-optimal edit path.

Figure 15. GED calculation process in the test shape. Red indices and arrows indicate substitution in the optimal edit path, whereas black indices indicate deletion. The two sub-tables display the edit cost of each edit operation. The sum of these costs is the graph edit cost. (To better explain the vertex matching between the two shape pairs, the test shape has been rotated)

Figure 15. GED calculation process in the test shape. Red indices and arrows indicate substitution in the optimal edit path, whereas black indices indicate deletion. The two sub-tables display the edit cost of each edit operation. The sum of these costs is the graph edit cost. (To better explain the vertex matching between the two shape pairs, the test shape has been rotated)

Table 6. Geometric difference between the regional characteristics of the test shape and the Z- shape template.

Table 7. Matrix-like association graph with weight w(u,v) in the current chain sequence. The units with bold numbers represent the optimal edit path (maximum weighted clique) of the current chain combination.

shows the geometric distortion cost between the regional characteristics of this shape and the Z-shape template. The cost of geometric distortion was generally lower for corner points with similar regional characteristics but higher for those that were visually different. For example, when calculating the geometric distortion cost of corner point u0 in the test shape (), the substitution cost of u0v0 and u0v4was higher than that of the other points of the Z-shape template. This is because the rotation angles of the corner points v0 and v4were different from u0 (1.5π and 0.5π, respectively); thus, the algorithm identified them as two different structures. Structure is the primary factor that determines the geometric distortion cost, which also explains why some points with similar structures but different sizes had low costs, such as u3v1 and u7v1.

After breaking both shapes into chain sets and considering the positional difference, represents the association graph with weight w(u,v) in the current chain sequence. The algorithm obtained an edit path using a substitution set (u2v0, u3v7, u4v6, u5v5, u6v4, u7v3, u10v2, u11v1), with an edit cost of 0.1380. Obviously, this was not the optimal edit path between the two shapes. The matches of u6v4, u7v3, and u10v2 were not consistent with human cognition. These unsuitable matches were derived from the inappropriate match sequence. Therefore, it is necessary to switch other chains in the chain set of the test shape or template shape graph.

As shown in , the algorithm obtained the optimal edit path using a substitution set (u0v6, u1v5, u2v4, u3v3, u4v2, u5v1, u9v0, and u11v7), with a minimum edit cost of 0.0823, by traversing the chain set of the template graph. In this edit path, the vertices of the two graphs associated with the corner points of the two shapes performed the substitution, and the vertices associated with the corner points of the test shapes(u6, u7, u8, u10, and u12) performed deletion. This edit operation was consistent with human cognition and reflected the high interpretability of the proposed GED method.

5.1.2. Major limitations of the GED method

One major limitation of the proposed GED method is its over-sensitivity to regional characteristics, which is common in structure-based methods. However, this sensitivity may not have a substantial impact on shape cognition unless numerous dense corner points and small contour segments of contours, are present, affecting the ability of the algorithm to accurately recognize regional characteristics. Second, the GED method can be time consuming when the number of vertices in the graphs is high. It exhibits a substantial disparity in efficiency when compared to GeoAI-based methods, which restrict utilizing it alone to intricate shape recognition tasks in practical scenarios.

5.2. Cluster technique for improving the method

To address this issue above mentioned limitations, we aimed to improve the accuracy of the algorithm by aggregating dense adjacent corner points into clusters when analyzing with complex shapes. In the algorithm, a cluster of points was treated as a vertex comprising the regional characteristics. Unlike simplification, this technique preserves the regional characteristics of individual points within a cluster. Every point is considered during the edit cost computation, and the cluster is considered as a single vertex during the optimal edit path determination.

Two cluster approaches were used in this study (). One approach merged the clustering of small convex and concave structures into a single vertex. The alternative approach aggregated continuous small segments together or with large segment. a cluster operation was performed depended on the left and right contour boundary lengths of the candidates. We set a cluster threshold k (0.02–0.04; the proportion of the shape perimeter) to determine points that needed clustering.

Figure 16. Two cluster approaches: (a) aggregates continuous small segments together or with large segments, such as v1 and v5, and (b) clusters small convex and concave structures into one vertex, such as v7.

Figure 16. Two cluster approaches: (a) aggregates continuous small segments together or with large segments, such as v1 and v5, and (b) clusters small convex and concave structures into one vertex, such as v7.

Cluster operation cans affect cognition results. A larger cluster threshold can reduce the elapsed time, but it may neglect crucial details. However, an appropriate threshold can help construct a more rational graph structure. presents the shape matching experiment results for the two cluster approaches with different thresholds. In general, the cluster technique with an appropriate threshold also can increase the accuracy of the algorithm when analyzing complex shapes. Due to the method's ability to reduce computational complexity, it holds the potential to enhance algorithm scalability, making it applicable for measuring the shape similarity of other natural features.

Table 8. Performance of shape similarity measurements using the GED method with different clusters and thresholds.

6. Conclusion

This study proposes a graph-based similarity measurement for building shapes based on GED, which constructs cyclic graphs by extracting the local and regional characteristics of shapes, calculates edit costs using geometry distortion, and transforms the optimal edit path into a series of maximum-weight clique to accurately compute the graph edit distance. Experimental results demonstrated that the proposed method was rotation invariant, discriminative, and outperformed conventional similarity measurements (FD, TD, and MF & GCD methods).

The proposed method also has some limitations: it’s sensitivity to regional characteristics and may overlook global information; it’s unable to measure holed polygons and multi-polygons; its efficiency is not as good as GeoAI-based methods, especially for measuring intricate shapes. Future studies should develop more rational graph structures to better extract the global characteristics when analyzing complex shapes.

Moreover, the proposed GED method showed great potential for integration with current mainstream shape cognition studies based on the GNN. The GED method provides detailed similarity measurement information that can be used to construct samples for supporting supervised learning. Furthermore, combining high-precision GED methods with high-efficiency GeoAI-based methods will allow precise fine-grained measurements of shape features, thus catering to various practical application scenarios.

Acknowledgement

The authors would like to sincerely thank the anonymous reviewers and editors for their constructive comments and valuable suggestions to improve the quality of this article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [Grant No. 41871305; 42371454; 42001340; 42122025] and the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education [Grant No. CUG2022ZR06].

References

  • Ai, T., X. Cheng, P. Liu, and M. Yang. 2013. “A Shape Analysis and Template Matching of Building Features by the Fourier Transform Method.” Computers, Environment and Urban Systems 41: 219–233. https://doi.org/10.1016/j.compenvurbsys.2013.07.002.
  • Arkin, E. M., L. P. Chew, D. P. Huttenlocher, K. Kedem, and J. S. B. Mitchell. 1991. “Anefficiently Computable Metric for Comparing Polygonal Shapes.” IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (3): 209–216. https://doi.org/10.1109/34.75509.
  • Ballatore, A., D. C. Wilson, and M. Bertolotto. 2013. “Computing the Semantic Similarity of Geographic Terms Using Volunteered Lexical Definitions.” International Journal of Geographical Information Science 27 (10): 2099–2118. https://doi.org/10.1080/13658816.2013.790548.
  • Biljecki, F., L. Z. X. Chew, N. Milojevic-Dupont, and F. Creutzig. 2021. “Open Government Geospatial Data on Buildings for Planning Sustainable and Resilient Cities.” ArXiv 28), https://doi.org/10.48550/arXiv.2107.04023.
  • Bunke, H., and G. Allermann. 1983. “Inexact Graph Matching for Structural Pattern Recognition.” Pattern Recognition Letters 1 (4): 245–253. https://doi.org/10.1016/0167-8655(83)90033-8.
  • Cetinkaya, S., M. Basaraner, and D. Burghardt. 2015. “Proximity-Based Grouping of Buildings in Urban Blocks: A Comparison of Four Algorithms.” Geocarto International 30 (6): 618–632. https://doi.org/10.1080/10106049.2014.925002.
  • Chen, Z. L., Q. Q. Feng, and X. C. Wu. 2015. “Representation Model of Topological Relations Between Complex Planar Objects.” Acta Geodaetica et Cartographica Sinica 44 (4): 438–444. https://doi.org/10.11947/j.AGCS.2015.20130708.
  • Chen, Z. L., X. C. Ma, L. Wu, and Z. Xie. 2019. “An Intuitionistic Fuzzy Similarity Approach for Clustering Analysis of Polygons.” ISPRS International Journal of Geo-Information 8 (2): 18. https://doi.org/10.3390/ijgi8020098.
  • Chen, Z. L., X. C. Ma, W. H. Yu, L. Wu, and Z. Xie. 2021. “Measuring the Similarity of Building Patterns Using Graph Fourier Transform.” Earth Science Informatics 14 (4): 1953–1971. https://doi.org/10.1007/s12145-021-00659-6.
  • Chen, Z., Y. Xu, and L. Wu. 2015. “Vector Planar Elements: Geometrical Similarity Measurement Based on Fourier Descriptors.” Geomatica 69 (4): 385–394. https://doi.org/10.5623/cig2015-401.
  • Courtial, A., A. El Ayedi, G. Touya, and X. Zhang. 2020. “Exploring the Potential of Deep Learning Segmentation for Mountain Roads Generalisation.” ISPRS International Journal of Geo-Information 9 (5): 21. https://doi.org/10.3390/ijgi9050338.
  • Danielsson, P.-E. 1980. “Euclidean Distance Mapping.” Computer Graphics and Image Processing 14 (3): 227–248. https://doi.org/10.1016/0146-664X(80)90054-4.
  • Du, S. H., M. Shu, and C. C. Feng. 2016. “Representation and Discovery of Building Patterns: A Three-Level Relational Approach.” International Journal of Geographical Information Science 30 (6): 1161–1186. https://doi.org/10.1080/13658816.2015.1108421.
  • Du, J. W., F. Wu, R. X. Xing, X. Y. Gong, and L. Y. Yu. 2022a. “Segmentation and Sampling Method for Complex Polyline Generalization Based on a Generative Adversarial Network.” Geocarto International 37 (14): 4158–4180. https://doi.org/10.1080/10106049.2021.1878288.
  • Du, J. W., F. Wu, J. C. Yin, C. Y. Liu, and X. Y. Gong. 2022b. “Polyline Simplification Based on the Artificial Neural Network with Constraints of Generalization Knowledge.” Cartography and Geographic Information Science 49 (4): 313–337. https://doi.org/10.1080/15230406.2021.2013944.
  • Eisler, H., and G. Ekman. 1959. “A Mechanism of Subjective Similarity.” Acta Psychologica 16: 1–10. https://doi.org/10.1016/0001-6918(59)90080-0.
  • Fan, H. C., Z. Y. Zhao, and W. W. Li. 2021. “Towards Measuring Shape Similarity of Polygons Based on Multiscale Features and Grid Context Descriptors.” ISPRS International Journal of Geo-Information 10 (5): 21. https://doi.org/10.3390/ijgi10050279.
  • Fan, H. C., A. Zipf, Q. Fu, and P. Neis. 2014. “Quality Assessment for Building Footprints Data on Openstreetmap.” International Journal of Geographical Information Science 28 (4): 700–719. https://doi.org/10.1080/13658816.2013.867495.
  • Fischer, A., C. Y. Suen, V. Frinken, K. Riesen, and H. Bunke. 2013. “A Fast Matching Algorithm for Graph-Based Handwriting Recognition.” Paper presented at the graph-based representations in Pattern Recognition, Berlin, Heidelberg, May 2013.
  • Fu, Z. L., L. Fan, Z. Q. Yu, and K. C. Zhou. 2018. “A Moment-Based Shape Similarity Measurement for Areal Entities in Geographical Vector Data.” ISPRS International Journal of Geo-Information 7 (6): 19. https://doi.org/10.3390/ijgi7060208.
  • Gao, X. B., B. Xiao, D. C. Tao, and X. L. Li. 2010. “A Survey of Graph Edit Distance.” Pattern Analysis and Applications 13 (1): 113–129. https://doi.org/10.1007/s10044-008-0141-y.
  • García Balboa, J. L., and F. J. Ariza López. 2008. “Generalization-Oriented Road Line Classification by Means of an Artificial Neural Network.” Geoinformatica 12 (3): 289–312. https://doi.org/10.1007/s10707-007-0026-z.
  • Goshtasby, A. 1985. “Description and Discrimination of Planar Shapes Using Shape Matrices.” IEEE Transactions on Pattern Analysis and Machine Intelligence 7 (6): 738–743. https://doi.org/10.1109/tpami.1985.4767734.
  • Hartmann, G. W. 1935. Gestalt Psychology: A Survey of Facts and Principles, Gestalt Psychology: A Survey of Facts and Principles. New York, NY, US: Ronald Press Company.
  • He, Z. J., M. Deng, J. N. Cai, Z. Xie, Q. F. Guan, and C. Yang. 2020. “Mining Spatiotemporal Association Patterns from Complex Geographic Phenomena.” International Journal of Geographical Information Science 34 (6): 1162–1187. https://doi.org/10.1080/13658816.2019.1566549.
  • Hu, Ming-Kuei. 1962. “Visual Pattern Recognition by Moment Invariants.” IRE Transactions on Information Theory 8 (2): 179–187. https://doi.org/10.1109/TIT.1962.1057692.
  • Hu, Y. H., C. Liu, Z. Li, J. K. Xu, Z. G. Han, and J. Z. Guo. 2022. “Few-Shot Building Footprint Shape Classification with Relation Network.” ISPRS International Journal of Geo-Information 11 (5): 16. https://doi.org/10.3390/ijgi11050311.
  • Huang, M., J. J. Lin, N. Chen, W. An, and W. J. Zhu. 2018. “Reversed Sketch: A Scalable and Comparable Shape Representation.” Pattern Recognition 80: 168–182. https://doi.org/10.1016/j.patcog.2018.03.001.
  • Huttenlocher, D. P., G. A. Klanderman, and W. J. Rucklidge. 1993. “Comparing Images Using the Hausdorff Distance.” IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (9): 850–863. https://doi.org/10.1109/34.232073.
  • Iivarinen, J., and A. Visa. 1996. “Shape Recognition of Irregular Objects.” Conference on Intelligent Robots and Computer Vision XV: Algorithms, Techniques, Active Vision, and Materials Handling 2904: 25–32. https://doi.org/10.1117/12.256280.
  • Jiang, H., Y. Chu, H. Yan, and L. Guo. 2010. “Measurement of Shape Similarity for Linear Objects in Multi-Scale Geographical Space.” Science of Surveying and Mapping 35 (05): 35–38. https://doi.org/10.16251/j.cnki.1009-2307.2010.05.063.
  • Koide, Satoshi, Chuan Xiao, and Yoshiharu Ishikawa. 2020. “Fast Subtrajectory Similarity Search in Road Networks Under Weighted Edit Distance Constraints.” arXiv preprint arXiv:2006.05564.
  • Lajevardi, S. M., A. Arakala, S. A. Davis, and K. J. Horadam. 2013. “Retina Verification System Based on Biometric Graph Matching.” IEEE Transactions on Image Processing 22 (9): 3625–3635. https://doi.org/10.1109/tip.2013.2266257.
  • Li, D. 2010. “Remotely Sensed Images and Gis Data Fusion for Automatic Change Detection.” International Journal of Image and Data Fusion 1 (1): 99–108. https://doi.org/10.1080/19479830903562074.
  • Li, P., H. Yan, and X. Lu. 2023a. “Multilinestringnet: A Deep Neural Network for Linear Feature Set Recognition.” Cartography and Geographic Information Science, 1–16. https://doi.org/10.1080/15230406.2023.2264756.
  • Li, P. B., H. W. Yan, and X. M. Lu. 2023b. “A Siamese Neural Network for Learning the Similarity Metrics of Linear Features.” International Journal of Geographical Information Science 37 (3): 684–711. https://doi.org/10.1080/13658816.2022.2143505.
  • Liu, C., Y. H. Hu, Z. Li, J. K. Xu, Z. G. Han, and J. Z. Guo. 2021. “Triangleconv: A Deep Point Convolutional Network for Recognizing Building Shapes in Map Space.” ISPRS International Journal of Geo-Information 10 (10): 14. https://doi.org/10.3390/ijgi10100687.
  • Liu, Changyong, Lian Xiong, Xiangyun Hu, and Jie Shan. 2015. “A Progressive Buffering Method for Road map Update Using OpenStreetMap Data.” ISPRS International Journal of Geo-Information 4 (3): 1246–1264.
  • Ma, D., I. Omer, T. Osaragi, M. Sandberg, and B. Jiang. 2019. “Why Topology Matters in Predicting Human Activities.” Environment and Planning B-Urban Analytics and City Science 46 (7): 1297–1313. https://doi.org/10.1177/2399808318792268.
  • Ma, C., Q. Sun, H. Chen, and B. Wen. 2016. “Recognition of Road Junctions Based on Road Classification Method.” Geomatics and Information Science of Wuhan University 41 (9): 1232–1237. https://doi.org/10.13203/j.whugis20160073.
  • Medin, D. L., R. L. Goldstone, and D. Gentner. 1993. “Respects for Similarity.” Psychological Review 100 (2): 254–278. https://doi.org/10.1037/0033-295X.100.2.254.
  • Mehrotra, R., and J. E. Gary. 1995. “Similar-Shape Retrieval in Shape Data Management.” Computer 28 (9): 57–62. https://doi.org/10.1109/2.410154.
  • Mokhtarian, F., and A. Mackworth. 1986. “Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes.” IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1): 34–43. https://doi.org/10.1109/tpami.1986.4767750.
  • Ozdemir, E., and C. Gunduz-Demir. 2013. “A Hybrid Classification Model for Digital Pathology Using Structural and Statistical Pattern Recognition.” IEEE Transactions on Medical Imaging 32 (2): 474–483. https://doi.org/10.1109/tmi.2012.2230186.
  • Pavlidis, T. 1980. “Algorithms for Shape Analysis of Contours and Waveforms.” IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2 4: 301–312. https://doi.org/10.1109/TPAMI.1980.4767029.
  • Pavlidis, T. 2012. Algorithms for Graphics and Image Processing. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Rehan, R. M. 2014. “Urban Branding as an Effective Sustainability Tool in Urban Development.” HBRC Journal 10 (2): 222–230. https://doi.org/10.1016/j.hbrcj.2013.11.007.
  • Sanfeliu, A., and K. S. Fu. 1983. “A Distance Measure Between Attributed Relational Graphs for Pattern Recognition.” IEEE Transactions on Systems, man, and Cybernetics 13 (3): 353–362. https://doi.org/10.1109/tsmc.1983.6313167.
  • Schwering, A. 2008. “Approaches to Semantic Similarity Measurement for Geo-Spatial Data: A Survey.” Transactions in GIS 12 (1): 5–29. https://doi.org/10.1111/j.1467-9671.2008.01084.x.
  • Sester, M., Y. Feng, and F. Thiemann. 2018. “Building Generalization Using Deep Learning.” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4: 565–572. https://doi.org/10.5194/isprs-archives-XLII-4-565-2018.
  • Shilane, P., P. Min, M. Kazhdan, and T. Funkhouser. 2004. “The Princeton Shape Benchmark.” Proceedings Shape Modeling Applications, 167–178. https://doi.org/10.1109/SMI.2004.1314504.
  • Shuman, D. I., B. Ricaud, and P. Vandergheynst. 2016. “Vertex-Frequency Analysis on Graphs.” Applied and Computational Harmonic Analysis 40 (2): 260–291. https://doi.org/10.1016/j.acha.2015.02.005.
  • Sonka, M., V. Hlavac, and R. Boyle. 1993. Image Processing, Analysis and Machine Vision. Boston, MA: Springer US.
  • Steiniger, S., T. Lange, D. Burghardt, and R. Weibel. 2008. “An Approach for the Classification of Urban Building Structures Based on Discriminant Analysis Techniques.” Transactions in GIS 12 (1): 31–59. https://doi.org/10.1111/j.1467-9671.2008.01085.x.
  • Teague, M. R. 1980. “Image-Analysis Via the General-Theory of Moments.” Journal of the Optical Society of America 69 (8): 1468.
  • Tian, Z., C. Men, Y. Liu, Q. Jiang, and Y. Tang. 2017. “A Spatial Object Shape Matching Method Based on Triangular Division.” Geomatics and Information Science of Wuhan University 42 (06): 749–755. https://doi.org/10.13203/j.whugis20150786.
  • Torsello, A., and E. R. Hancock. 2004. “A Skeletal Measure of 2d Shape Similarity.” Computer Vision and Image Understanding 95 (1): 1–29. https://doi.org/10.1016/j.cviu.2004.03.006.
  • Torsello, A., A. Robles-Kelly, and E. R. Hancock. 2007. “Discovering Shape Classes Using Tree Edit-Distance and Pairwise Clustering.” International Journal of Computer Vision 72 (3): 259–285. https://doi.org/10.1007/s11263-006-8929-y.
  • Touya, G., and M. Dumont. 2017. “Progressive Block Graying and Landmarks Enhancing as Intermediate Representations Between Buildings and Urban Areas.” Paper presented at the Proceedings of 20th ICA workshop on Generalisation and multiple representation, Washington, DC, USA, July 2017.
  • Tversky, A. 1977. “Features of Similarity.” Psychological Review 84 (4): 327–352. https://doi.org/10.1037/0033-295X.84.4.327.
  • Wagner, R. A., and M. J. Fischer. 1974. “The String-to-String Correction Problem.” Journal of the ACM 21 (1): 168–173. https://doi.org/10.1145/321796.321811.
  • Wang, Y., S. Cai, and M. Yin. 2016. “Two Efficient Local Search Algorithms for Maximum Weight Clique Problem.” Paper presented at the Proceedings of the thirtieth AAAI Conference on Artificial intelligence, phoenix, Arizona, February 2016.
  • Wessel, R., S. Ochmann, R. Vock, I. Blümel, and R. Klein. 2011. “Efficient Retrieval of 3d Building Models Using Embeddings of Attributed Subgraphs.” Paper presented at the Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, Scotland, October 2011.
  • Wu, B., B. L. Yu, Q. S. Wu, Z. Q. Chen, S. J. Yao, Y. Huang, and J. P. Wu. 2018. “An Extended Minimum Spanning Tree Method for Characterizing Local Urban Patterns.” International Journal of Geographical Information Science 32 (3): 450–475. https://doi.org/10.1080/13658816.2017.1384830.
  • Xu, Y. Y., Z. L. Chen, Z. Xie, and L. Wu. 2017a. “Quality Assessment of Building Footprint Data Using a Deep Autoencoder Network.” International Journal of Geographical Information Science 31 (10): 1929–1951. https://doi.org/10.1080/13658816.2017.1341632.
  • Xu, Y. Y., Z. Xie, Z. L. Chen, and L. Wu. 2017b. “Shape Similarity Measurement Model for Holed Polygons Based on Position Graphs and Fourier Descriptors.” International Journal of Geographical Information Science 31 (2): 253–279. https://doi.org/10.1080/13658816.2016.1192637.
  • Xu, Y. Y., Z. Xie, Z. L. Chen, and M. Y. Xie. 2021. “Measuring the Similarity Between Multipolygons Using Convex Hulls and Position Graphs.” International Journal of Geographical Information Science 35 (5): 847–868. https://doi.org/10.1080/13658816.2020.1800016.
  • Yan, H. W. 2015. “Quantitative Relations Between Spatial Similarity Degree and Map Scale Change of Individual Linear Objects in Multi-Scale Map Spaces.” Geocarto International 30 (4): 472–482. https://doi.org/10.1080/10106049.2014.902115.
  • Yan, X. F., T. H. Ai, M. Yang, and X. H. Tong. 2021. “Graph Convolutional Autoencoder Model for the Shape Coding and Cognition of Buildings in Maps.” International Journal of Geographical Information Science 35 (3): 490–512. https://doi.org/10.1080/13658816.2020.1768260.
  • Yan, X. F., T. H. Ai, M. Yang, and H. M. Yin. 2019. “A Graph Convolutional Neural Network for Classification of Building Patterns Using Spatial Vector Data.” ISPRS Journal of Photogrammetry and Remote Sensing 150: 259–273. https://doi.org/10.1016/j.isprsjprs.2019.02.010.
  • Yan, X. F., T. H. Ai, and X. Zhang. 2017. “Template Matching and Simplification Method for Building Features Based on Shape Cognition.” ISPRS International Journal of Geo-Information 6 (8): 17. https://doi.org/10.3390/ijgi6080250.
  • Yan, X. F., and M. Yang. 2022. “A Comparative Study of Various Deep Learning Approaches to Shape Encoding of Planar Geospatial Objects.” ISPRS International Journal of Geo-Information 11 (10): 14. https://doi.org/10.3390/ijgi11100527.
  • Yang, M., T. H. Ai, X. F. Yan, Y. Y. Chen, and X. Zhang. 2018. “A Map-Algebra-Based Method for Automatic Change Detection and Spatial Data Updating Across Multiple Scales.” Transactions in GIS 22 (2): 435–454. https://doi.org/10.1111/tgis.12320.
  • Yang, C. Z., H. Wei, and Q. Yu. 2018. “A Novel Method for 2d Nonrigid Partial Shape Matching.” Neurocomputing 275: 1160–1176. https://doi.org/10.1016/j.neucom.2017.09.067.
  • Yu, H. F., T. H. Ai, M. Yang, W. M. Huang, and L. Harrie. 2023. “A Graph Autoencoder Network to Measure the Geometric Similarity of Drainage Networks in Scaling Transformation.” International Journal of Digital Earth 16 (1): 1828–1852. https://doi.org/10.1080/17538947.2023.2212920.
  • Yu, B. L., S. Shu, H. X. Liu, W. Song, J. P. Wu, L. Wang, and Z. Q. Chen. 2014. “Object-Based Spatial Cluster Analysis of Urban Landscape Pattern Using Nighttime Light Satellite Images: A Case Study of China.” International Journal of Geographical Information Science 28 (11): 2328–2355. https://doi.org/10.1080/13658816.2014.922186.
  • Zhang, L. Q., H. Deng, D. Chen, and Z. Wang. 2013. “A Spatial Cognition-Based Urban Building Clustering Approach and Its Applications.” International Journal of Geographical Information Science 27 (4): 721–740. https://doi.org/10.1080/13658816.2012.700518.
  • Zhang, D., and G. Lu. 2002. “A Comparative Study of Fourier Descriptors for Shape Representation and Retrieval.” Paper presented at the Asian Conference on Computer Vision 2002, fitzroy Vic, Australia, January 2002.
  • Zhang, D. S., and G. J. Lu. 2004. “Review of Shape Representation and Description Techniques.” Pattern Recognition 37 (1): 1–19. https://doi.org/10.1016/j.patcog.2003.07.008.
  • Zheng, J., Z. R. Gao, J. S. Ma, J. Shen, and K. Zhang. 2021. “Deep Graph Convolutional Networks for Accurate Automatic Road Network Selection.” ISPRS International Journal of Geo-Information 10 (11): 22. https://doi.org/10.3390/ijgi10110768.
  • Zhou, Z., C. Fu, and R. Weibel. 2023. “Move and Remove: Multi-Task Learning for Building Simplification in Vector Maps with a Graph Convolutional Neural Network.” ISPRS Journal of Photogrammetry and Remote Sensing 202: 205–218. https://doi.org/10.1016/j.isprsjprs.2023.06.004.