Search in:

International Journal of Digital Earth Volume 17, 2024 - Issue 1

Submit an article Journal homepage

Open access

1,589

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Sponet: solve spatial optimization problem using deep reinforcement learning for urban spatial decision analysis

Haojian Lianga School of Artificial Intelligence, Jilin University, Changchun, People’s Republic of ChinaView further author information

Shaohua Wangb Key Laboratory of Remote Sensing and Digital Earth Chinese Academy of Sciences, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, People’s Republic of China;c International Research Center of Big Data for Sustainable Development Goals, Beijing, People’s Republic of China;d Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Huilai Lia School of Artificial Intelligence, Jilin University, Changchun, People’s Republic of China;e School of Mathematics, Jilin University, Changchun, People’s Republic of ChinaView further author information

Liang Zhouf Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou, People’s Republic of ChinaView further author information

Hechang Chena School of Artificial Intelligence, Jilin University, Changchun, People’s Republic of ChinaView further author information

Xueyan Zhangg Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USAView further author information

Xu Chena School of Artificial Intelligence, Jilin University, Changchun, People’s Republic of ChinaView further author information

show all

Article: 2299211 | Received 15 Jun 2023, Accepted 20 Dec 2023, Published online: 28 Dec 2023

Cite this article
https://doi.org/10.1080/17538947.2023.2299211
CrossMark

In this article

ABSTRACT
1. Introduction
2. Related work
3. Preliminaries
4. Methodology
5. Experiments and results
6. Conclusion
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Urban spatial decision analysis is a critical component of spatial optimization and has profound implications in various fields, such as urban planning, logistics distribution, and emergency management. Existing studies on urban facility location problems are based on heuristic methods. However, few studies have used deep learning to solve this problem. In this study, we introduce a unified framework, SpoNet. It combines the characteristics of location problems with a deep learning model SpoNet can solve spatial optimization problems: p-Median, p-Center, and maximum covering location problem (MCLP). It involves modeling each problem as a Markov Decision Process and using deep reinforcement learning to train the model. To improve the training efficiency and performance, we integrated knowledge SpoNet. The results demonstrated that the proposed method has several advantages. First, it can provide a feasible solution without the need for complex calculations. Second, integrating the knowledge model improved the overall performance of the model. Finally, SpoNet is more accurate than heuristic methods and significantly faster than modern solvers, with a solution time improvement of more than 20 times. Our method has a promising application in urban spatial decision analysis, and further has a positive impact on sustainable cities and communities.

KEYWORDS:

Urban spatial decision analysis
spatial optimization problems
p-Median
MCLP
attention model
deep reinforcement learning

This article is part of the following collections:

Big Earth Data in Support of SDG 11: Sustainable Cities and Communities

1. Introduction

Urban spatial decision analysis plays a critical role in the development of sustainable cities and communities. With the acceleration of urbanization, cities have encountered lots of challenges, including traffic congestion, outdated infrastructure, and disaster risks. These challenges were caused by insufficient and uneven utilization of urban space, inadequate allocation of public facilities, impractical transportation planning, and immature emergency warning mechanisms. As an important strategy of sustainable urban development, spatial optimization maximizes the utilization and optimization of available spatial resources through rational planning, design, and management. The urban Facility Location Problem (FLP) is a significant spatial optimization problem that plays an essential role in logistics, communication (Gao et al. Citation2022), and emergency management (Bleha and Ďurček Citation2023). The city-oriented facility location problem is a key component of urban spatial analysis, providing a feasible decision basis for urban planning (Zhao, Sailor, and Wentz Citation2018), urban management and design (Beairsto et al. Citation2022) by analyzing the layout and spatial distribution of existing facilities in the city (Guo et al. Citation2022). During this process, the influence of factors on facility construction should be considered, which can greatly satisfy urban residents’ requirements and promote sustainable development (Huang et al. Citation2021), including traffic (Jiang et al. Citation2022; Xia et al. Citation2022), environment (Zhang et al. Citation2020), socioeconomic factors (Rabiei-Dastjerdi et al. Citation2021), policy and regulation.

In urban spatial decision analysis, p-Median (Hakimi Citation1965), coverage (Church,1972), and p-Center (Hakimi Citation1965) are three location problems that offer valuable insights and methodologies to aid in the process of identifying optimal locations. Typical methods for solving location problems can be categorized into three classes: exact, approximate, and heuristic algorithms. Exact algorithms are essentially an exhaustive enumeration of the solution space, which can provide the optimal solution to the problem. The typical algorithms include Branch and Bound, Lagrange Relaxation, and Column Generation. However, for a large-scale problem, it is infeasible to solve in polynomial time as the number of nodes increases and the solution space increases exponentially. Hence, scholars have suggested providing a practical solution to satisfy the need for practical problems without determining the best possible outcome (Williamson and Shmoys Citation2011). The approximate algorithm theoretically provides the existing gap between the approximate and optimal solutions. Heuristic and metaheuristic algorithms cannot theoretically address the gap between the achieved solution and the ideal optimal solution (Beheshti and Shamsuddin Citation2013), while they always can provide a feasible solution to large-scale practical problems and have diverse applications across various fields. The representative algorithms are genetic algorithms, simulated annealing, and variable neighborhood search.

The algorithms mentioned above have been widely applied to urban spatial decision analyses. Recently, some deep learning methods have been utilized to address the challenges of urban sustainable development and optimization problems. Although deep learning methods have been extensively employed to solve routing problems (Mazyavkina et al. Citation2021), such as the Traveling Salesman Problem (TSP) and Vehicle Routing Problem (VRP), few studies apply these techniques to location problems. Previous studies mainly focused on designing different heuristic methods to solve problems (Kaveh and Khayatazad Citation2012; Song and Chen Citation2018; Li, Li, and Ma Citation2022). These well-designed heuristic approaches can undoubtedly enhance performance, while they need to be redesigned for different problems and the performance of algorithms, which largely depend on the intuition and experience of experts. Deep learning can replace the handcrafted design of artificial rules to some extent (Mecheter et al. Citation2022; Xiao, Xu, and Wan Citation2016).

The neural-spatial optimization method (NeurSPO) utilizes deep learning to address spatial optimization problems. Deep construction and deep improvement are two main directions for utilizing the deep learning method to address spatial optimization problems. Deep construction refers to the initial generation or creation of solutions using deep learning techniques. Deep improvement pertains to the refinement or enhancement of these solutions through further deep learning processes. This study mainly focuses on the use of deep construction to realize an end-to-end solution to solve spatial optimization problems.

The primary contributions of our work are as follows.

We focus on three classic FLPs, p-Median, p-Center and maximum covering location problem (MCLP) which are crucial for urban spatial decision analysis. It is challenging to obtain the optimal solution due to the NP hardness. Therefore, we propose a novel algorithm and design a unified structure-SpoNet for solving FLPs.
We formulate the p-Median, p-Center and MCLP as Markov Decision Process (MDP) and adopt the Deep Reinforcement Learning (DRL) to train the SpoNet. Based on allocation relationship between nodes, we combined the characteristics of the FLPs to incorporate local features, order and distribution relations, and coverage information into the network models.
Extensive benchmark experiments demonstrate that SpoNet outperforms heuristic methods, linear programming solvers, and a deep learning baseline in terms of solution quality and efficiency. The ablation study further emphasizes the effectiveness of dynamic coverage information and GRU structure in SpoNet.

To provide a clear and organized presentation, we structured the paper as follows. Section 2 provides an overview of related work in this field, while Section 3 presents the preliminary knowledge and problem description. In Section 4, we describe our methodology, which focuses on adopting reinforcement learning to train the model, and introduces SpoNet. We discuss the comparison analysis, ablation study, and transfer study in section 5. Finally, Section 6 concludes the paper.

2. Related work

In this section, we provide an overview of the relevant research in two main aspects: urban FLPs and deep learning for spatial optimization problems.

2.1. Urban facility location problems

Real-world urban spatial decision problems can be effectively tackled using optimization concepts from operations research, often transferred into p-Median, p-Center, and MCLP.

Hakimi (Citation1964) proposed the single-facility p-Median and p-Center problem, and then this work was extended to the multi-facility situation (Hakimi Citation1965), which is the p-Median and p-Center problem. Subsequently, ReVelle and Swain (Citation1970) proposed a discrete p-Median problem and the first MILP model. Teitz and Bart (Citation1968) first proposed the interchange heuristic (or vertex substitution), which starts with a feasible p-Median candidate set and iteratively checks possible exchanges with other candidates until the optimal is reached. A heuristic method introduced by Alcaraz, Landete, and Monge (Citation2012) for large-scale instances consists of three key components: subgradient column generation, one core heuristic, and one clustering process. Kazakovtsev and Stupina (Citation2014) proposed a multistage method comprising clustering, VNS, and exact methods. MCLP was first introduced by Church and Velle (Citation1974), who demonstrated a distinct definition, Integer Linear Programming (ILP) of MCLP, and greedy addition and greedy addition with substitution. Later, diverse variations in the MCLP and corresponding heuristic methods were proposed. Pirkul and Schilling (Citation1989) used Lagrange Relaxation (LR) to solve a capacitated MCLP, Berman and Krass (Citation2002) proposed a Lagrangian relaxation algorithm, and ReVelle, Scholssberg, and Williams (Citation2008) utilized a metaheuristic method and heuristic concentration to solve the MCLP. In addition, there are many other algorithms for solving the MCLP, such as Benders decomposition, nonlinear programming, Tabu algorithm, and local search algorithm.

2.2. Deep learning for spatial optimization problems

The classical methods mentioned above for solving FLPs have been widely applied in practice. However, due to the NP-hard nature of these problems, these methods often provide only a feasible solution, particularly in large-scale scenarios. Hence, there is an urgent need to explore novel approaches to solving FLPs. Deep reinforcement learning is effective in handling spatial decision-making and providing a robust foundation for the development of new methods.

In recent years, for the routing problem, there has been a significant amount of research exploring spatial optimization problems with deep learning. The pioneering work on utilizing the deep learning technique to solve COPs was introduced by Vinyals, Fortunato, and Jaitly (Citation2015). Inspired by the seq2seq structure of machine translation, Pointer Network (Ptr-Net) was proposed to solve the Convex Hull problem and Traveling Salesman Problem by adding a pointer to seq2seq. Afterward, as the transformer model (Vaswani et al. Citation2017) became popular in NLP, Deudon et al. (Citation2018) improved the Ptr-Net based on the Multi-head Attention (MHA) mechanism in the transformer. They used the REINFORCE method to train the model, and utilized the 2-opt method for further search solution space after the solution was generated, which effectively enhanced the quality of the solution. Kool, van Hoof, and Welling (Citation2018) proposed the Attention Model (AM) to tackle different types of routing problems, such as TSP, Capacitated VRP (CVRP), Orienteering Problem (OP), and Prize Collecting TSP (PCTSP). Inspired by AM, Peng, Wang, and Zhang (Citation2020) proposed an AM with dynamic characteristics that can significantly enhance the performance of the model. Xin et al. (Citation2021) proposed the use of Multi-Decoder AM (MDAM) to solve VRP. MDAM comprises a single encoder and multiple decoders, containing the same structure but different parameters, and it obtains promising results on six routing problems. Kwon et al. (Citation2020) revised the RL algorithm in the AM, raised Policy Optimization with Multiple Optima (POMP), and obtained SOTA results.

The above-mentioned research mainly focuses on TSP, VRP, PCTSP, OP, and other routing problems, but little research has been conducted on urban facility location problems. Using neural networks to address p-Median problems can be traced back to the Hopfield network (Domínguez and Muñoz Citation2008). However, this neural network can only learn and solve a single small-scale p-Median problem each time. For the new problem, it still needs to be retrained and compared with traditional methods. Liang et al. (Citation2022) proposed a new method for solving the p-Center problem by combining Graph Convolutional Networks (GCN) with a clustering algorithm. Wang et al. (Citation2023) proposed the DeepMCLP to solve MCLP with the DRL. Our study is situated at the intersection of big data-driven spatial decision support systems and connectivity-based graph neural networks. Recent studies (Andronie et al. Citation2021; Nica et al. Citation2023) emphasized data-driven approaches in sustainable urban governance and smart urban economies. Lăzăroiu et al. (Citation2020) explored sustainability in urban corporate economies. These sources inform our research, highlighting the significance of data-driven solutions in complex urban environments.

3. Preliminaries

3.1. Problem definition

For all the urban facility location problems in this study, let $N = {1, 2, \dots, n}$ be the set of demand points and $M = {1, 2, \dots, m}$ be the set of facilities. Subsequently, we introduced three spatial optimization problems. The decision variables are defined as follows. $x_{i j} = {\begin{matrix} 1, & if demand point i is assigned to the facility j, \\ 0, & otherwise . \end{matrix}$ $y_{j} = {\begin{matrix} 1, & if and only if facility j is open, \\ 0, & otherwise . \end{matrix}$

3.1.1. p-Median problem

The p-Median problem can be modeled using integer linear programming (ILP). The formulation is as follows (ReVelle and Swain Citation1970). (1) $\begin{matrix} \begin{matrix} Minimize & \sum_{i = 1}^{n} \sum_{j = 1}^{m} d_{i j} x_{i j} \end{matrix} \end{matrix}$ (1)

(2) $\begin{matrix} \begin{matrix} Subject to & \sum_{j = 1}^{m} x_{i j} = 1, i = 1, \dots, n, \end{matrix} \end{matrix}$ (2) (3) $\begin{matrix} \sum_{j = 1}^{m} y_{j} = p, \end{matrix}$ (3) (4) $\begin{matrix} x_{i j} \leq y_{j}, i = 1, \dots, n, j = 1, \dots, m . \end{matrix}$ (4) (5) $\begin{matrix} x_{i j}, y_{j} \in {0, 1} i = 1, \dots, n, j = 1, \dots, m . \end{matrix}$ (5)

Constraint (2) guarantees that each point is assigned a unique median value. Condition (3) ensures that only $p$ median points exist. Inequality (4) ensures the coherence of the solutions; that is, demand point $i$ cannot be assigned to facility $j$ if facility $j$ is not determined as a median.

3.1.2. p-Center problem

Hakimi (Citation1964) first proposed the p-Center problem. The essence of this problem is to find $p$ facilities to open and allocate $n$ customers to these $p$ facilities. In the p-Center problem, the objective function minimizes the maximum distance from each demand point to the nearest facility. The Integer Linear Programming format of the p-Center problem is as follows (Daskin Citation1997): (6) $\begin{matrix} \begin{matrix} Minimize & z \end{matrix} \end{matrix}$ (6) (7) $\begin{matrix} \begin{matrix} Subject to & \sum_{j = 1}^{m} y_{j} = p, \end{matrix} \end{matrix}$ (7) (8) $\begin{matrix} \sum_{j = 1}^{m} x_{i j} = 1, i = 1, \dots, n \end{matrix}$ (8) (9) $\begin{matrix} x_{i j} \leq y_{j}, i = 1, \dots, n, j = 1, \dots, m \end{matrix}$ (9) (10) $\begin{matrix} \sum_{j = 1}^{m} d_{i j} x_{i j} \leq z, i = 1, \dots, n \end{matrix}$ (10) (11) $\begin{matrix} x_{i j}, y_{j} \in {0, 1} i = 1, \dots, n, j = 1, \dots, m . \end{matrix}$ (11)

Equation (7) ensures that the number of open facilities is $p$ . Constraint (8) assigns each demand point to only one facility: Inequality (9) guarantees that each demand point is assigned to an open facility: Constraint (10) requires that $z$ to be the upper bound of the distance between any demand point to the assigned facility. It is evident that $z = m a x_{i \in N} \sum_{j = 1}^{m} d_{i j} x_{i j}$ is the optimal cover radius.

3.1.3. MCLP

The essential goal for the MCLP is to cover the most demand points (or serve more customers) under the condition that the number and radius of facilities are known. The ILP formulation of the MCLP is as follows (Church and Velle Citation1974): (12) $\begin{matrix} \begin{matrix} Maximize & z \end{matrix} = \sum_{i = 1}^{n} a_{i} y_{i} \end{matrix}$ (12) (13) $\begin{matrix} \begin{matrix} Subject to & \sum_{j \in N_{i}} x_{j} \geq y_{i} \end{matrix}, i = 1, \dots, n \end{matrix}$ (13) (14) $\begin{matrix} \sum_{j = 1}^{m} x_{j} = p, \end{matrix}$ (14) (15) $\begin{matrix} x_{j}, y_{i} \in {0, 1} i = 1, \dots, n, j = 1, \dots, m . \end{matrix}$ (15) where $S$ represents the distance beyond which a customer is considered ‘uncovered’. $d_{i j}$ is the shortest distance from point $i$ to point $j$ . $a_{i}$ is the population to be served at demand point $i$ . $N_{i} = {j \in M | d_{i j} \leq S}$ . $x_{j}$ and $y_{i}$ are two decision variables. $x_{j} = {\begin{matrix} 1, & if a facility i is allocated to site j, \\ 0, & otherwise . \end{matrix}$ $y_{i} = {\begin{matrix} 1, & if and only if facility i is open, \\ 0, & otherwise . \end{matrix}$

Constraint (13) ensures that $y_{i}$ equals 1 only when one or more facilities are established at sites in the set $N_{i}$ . Condition (14) restricts the number of facilities allocated to be $p$ .

3.2. Deep learning model

3.2.1. Attention model

The Attention Model (AM), a deep learning model, was proposed by Kool, van Hoof, and Welling (Citation2018) to solve various routing problems and consists of an Encoder and Decoder. For the initial input node $X$ , the original vector is first embedded into the $d_{x}$ high-dimensional space through a linear mapping layer and then through $N$ -th attention layers, which contain both multi-headed attention and feed-forward neural networks. The formula used is as follows: (16) $\begin{matrix} {\hat{h}}_{i} = B N^{l} (h_{i}^{l - 1} + M H A_{i}^{l} (h_{1}^{l - 1}, \dots, h_{n}^{l - 1})) \end{matrix}$ (16) (17) $\begin{matrix} h_{i}^{l} = B N^{l} ({\hat{h}}_{i} + F F^{l} ({\hat{h}}_{i})) \end{matrix}$ (17) where $h_{i}$ is the node embedding, $M H A$ represents the multi-head attention layer, $B N$ is the batch normalization layer, and $F F$ represents the feed-forward layer. During the decoding process, one node is selected per step. In step $t$ , the decoder is context embedding based on the embedding characteristic of the hidden layer from the encoder and decoder output at the first and $t - 1$ steps. The context embedding is calculated as follows: (18) $\begin{matrix} h_{c}^{N} = {\begin{matrix} [{\bar{h}}^{N}, h_{π_{t - 1}}^{N}, h_{π_{1}}^{N}] & t \geq 1 \\ [{\bar{h}}^{N}, v_{1}, v_{f}] & t = 1. \end{matrix} \end{matrix}$ (18)

3.2.2. Gate recurrent unit

GRU, a neural network, is a variant of LSTM (Hochreiter and Schmidhuber Citation1997), which can efficiently handle sequence data and can be used to solve the gradient disappearance problem in long-term memory and backpropagation problems during training. The GRU contains a reset gate and an update gate. The reset gate acts on the previous hidden state to determine how much of the previous information should be forgotten, and the update gate acts on the current and previous hidden units to determine which valid information needs to be retained from the previous and current moments. The updated formula is as follows (Chung et al. Citation2014): $z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])$ $r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])$ ${\bar{h}}_{t} = \tanh (W \cdot [r_{t} * h_{t - 1}, x_{t}])$ $h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\bar{h}}_{t}$ where $r_{t}$ represents the reset gate, and $h_{t - 1}$ contains the previous information. Multiplying the two by elements is used to measure whether the previous information is useful for future predictions. $z_{t}$ denotes the update gate and determines how to combine the new input information with previous memory. The problem of gradient decay can be effectively addressed by GRU, whereas GRU can more efficiently capture dependencies with larger intervals in sequential data.

3.3. Markov decision process and deep reinforcement learning

3.3.1. Markov decision process

The MDP is a mathematical model that imitates the stochastic strategy and reward of an agent in an environment, and the state of the environment has a Markov Property. It contains a finite state set $S$ , action set $A$ , state transition probability $P$ , discount factor $γ$ , and reward function $R$ , which is denoted as quintuple $(S, A, P, R, γ)$ . A deep learning model can be viewed as an agent, and the problem can be regarded as the environment. By sensing the state of the external environment, the agent makes decisions, acts on the environment, and adjusts its strategy through rewards. The state of the environment varies with changes in agent action, and the variation in the environment can be sensed by the agent. The environment can provide feedback and relative reward for the decision made by the agent.

Taking p-Median as an example, we considered AM as the agent, the p-Median problem as the environment, and the state of the environment as relevant variables of the problem, such as the node coordinates of users and facilities, the distance between each user and facilities, whether facilities are selected, whether users are served, etc. A facility is chosen by AM as the median (center) from the facility set, according to the present state information $s_{t}$ . Afterward, AM calculates the current total weighted distance (i.e. p-Median problem) based on the selected median and differentiates the total weighted distance between steps $t$ and step $t - 1$ , This difference is used as the reward of that step or is the feedback made by the environment to the action. We then iterate until $p$ actions are selected.

3.3.2. Deep reinforcement learning

The spatial optimization problems are all $N P$ -hardness problems. For small-scale problems, a large set of labeled data can be generated by a solver, and supervised learning can be utilized to train the model. However, creating enough labeled data is time-consuming, labor-consuming, and even challenging. We use DRL to train the model and enable it to learn how to optimize through continuous interaction between the agent and environment.

The task of DRL is always described by the MDP, and this study mainly applies the REINFORCE algorithm (Williams Citation1992) which is a renewed policy gradient method based on the Monte Carlo method. The policy gradient applies a random gradient ascent measure to update the parameters under the given objective function, and the random gradient ascent formula is as follows: (19) $\begin{matrix} θ_{t + 1} = θ_{t} + α \nabla L (π) \end{matrix}$ (19) In our problem, because the goal is to minimize the objective function, the negative value of the optimization objective for each problem is used to replace the value of $q$ in the above equation to denote the sum of the rewards after $t$ . When $L (π)$ increases, it suggests that the reward of the current state achieves the maximum and the parameter transfers in the direction of increasing the probability of it being in the current state. The larger the reward, the greater the gradient update.

4. Methodology

We adopt reinforcement learning to train the model to solve unseen instances. First, we convert the location problem into an MDP and introduce a quintuple $M = (S, A, P, r, γ)$ to express the problem. We then propose a unified deep model, SpoNet, for such spatially optimal problems. The dynamic covering information and gated recurrent unit (GRU) affect the Encoder and Decoder, respectively. Finally, we provided the REINFORCE algorithm with a rollout baseline to train our models. The SpoNet framework is shown in .

Figure 1. Total Workflow of Our SpoNet. (a) Encoder. Fusing static location information and dynamic coverage information of nodes to generate key. (b) Decoder. The Encoder is based on a GRU structure to generate the query. (C) Attention. At each step, the probability of selecting each node is computed.

4.1. Reformulate as MDP form

The key issue of p-Median, p-Center, and MCLP is how to select $p$ facilities from $N$ candidate facilities to meet the objective function. We used a constructive method to model the three spatial optimization problems with a unified framework. In the decision-making process, we select a facility point at each step until we choose p facility points. We essentially modeled this selection process as an MDP.

The MDP is usually represented by a five-tuple, including state, action, state transition matrix, reward, and discount factor. The following terms are defined.

State: State $s_{T} = (M_{t})$ represents the current solution sets in the construction process in step $t$ , where $M_{T}$ includes all selected facility points up to time step $t$ , $M_{0}$ represents an empty solution set.

Action: In the process of constructing the solution, each step must select a node as the facility point. Let $a_{T}$ be denoted as $z_{j}$ . In other words, node $z_{j}$ at time step $t$ as the facility point.

Transition: State transition indicates that in state $s_{t}$ , the model adopts action $a_{t}$ to enter the next state $s_{t + 1} = M_{t + 1} = (M_{t} z_{j}),$ that is, the solution of the next state $s_{t + 1}$ is spliced together by the current state $s_{t}$ and point $z_{j}$ corresponding to the action.

Discount factor: The discount factor was set to 1. When calculating the reward function, we take the objective function as the final reward, and the cumulative reward is not calculated.

Reward: The strategic network's optimization direction is determined by the reward function. The three spatial optimization problems have different optimization goals, so different reward functions need to be set.

Denoting our policy by $p_{θ}$ , the policy network selects a node at each time step until $p$ points are acquired as the facilities. The other demand points are then assigned to the nearest facilities, and the final objective function is calculated. It provides a feasible solution to the problem, denoted by $π = {π_{1}, \dots, π_{p}, p \leq n}$ , and only one node is selected at each step, so our SpoNet strategy $p (π | s)$ can be expressed as (20) $\begin{matrix} p_{θ} (π | s) = \prod_{t = 1}^{p} p_{θ} (π_{t} | s, π_{1 \sim t - 1}), p \leq n \end{matrix}$ (20) To train our model, we defined the loss $L (θ | s) = E_{p θ (π | s)} [L (π)] :$ the expectation of cost $L (π) .$ For p-Median, the cost function $L (π)$ is (21) $\begin{matrix} L (π) = \sum_{i = 1}^{N} \sum_{j = 1}^{M} d_{i j} x_{i j} \end{matrix}$ (21) For p-Center, the cost function $L (π)$ is: (22) $\begin{matrix} L (π) = max_{i \in I} \sum_{j = 1}^{M} d_{i j} x_{i j} \end{matrix}$ (22) For MCLP, the cost function $L (π)$ is: (23) $\begin{matrix} L (π) = - \sum_{i = 1}^{N} a_{i} y_{i} \end{matrix}$ (23) Unlike the routing problem, there is no sequence relationship between the points selected in location problems. However, a point must be selected at each step during the modeling process, and the previous point selection influences the current point selection. Therefore, the DRL algorithm solves the coverage relationship of points with changing environmental conditions. The Encoder uses the original features $x$ and generates the final embedding for each point. The Decoder takes the encoder's final embedding as the input, combines the previously selected point information $π_{{1 \sim t - 1}},$ and finally outputs $π_{t}$ . Apparently, only one point is output at each step, and the decoder observes the mask to indicate coverage and the point to choose to deploy the facility.

4.2. The SpoNet

We first described the encoder–decoder structure model to introduce dynamic coverage information. The encoder encodes the original features to generate the hidden embedding for each node, while the decoder takes hidden embedding as input. Finally, the selection probability for each node was the output. In the location problem, a covering relationship exists between nodes. If a node has been covered, it will no longer be selected, or the chances of selection will be reduced. Therefore, we propose a dynamic coverage model that encodes the static coordinates of cities and the coverage states between cities to improve model performance. The SpoNet model is illustrated in .

Figure 2. The Spatial Optimal Neural Network (SpoNet).

4.2.1. Encoder

The encoder can transform realistic problems into mathematical problems, and the original features are partitioned into two components: static and dynamic embedding. Static embedding represents the two-dimensional coordinate information of each point, which does not affect the features during the model training. In contrast, dynamic embedding indicates that the node state constantly changes as it is covered. Therefore, the dynamic feature represents coverage information.

There are two parts for processing static embedding: node embedding and multi-headed attention (MHA) layer. The original features are fed into the network and obtain the node embedding after a layer of linear mapping, then run through $N$ layers of the attention layer, and finally generate the static embedding of each point. Each Attention layer is composed of one MHA layer and a feed-forward neural network. Each sublayer was connected to a skip-connection structure and normalized layer $B N$ . In other words, for each sublayer neural network, given an input $x$ , there is $B N (x + s u b l a y e r (x))$ . The details of the encoder are as follows:

First, we mapped the $d_{x} (d_{x} = 2)$ dimensional location information of each point to the $d_{h} (d_{h} = 128)$ dimensional node embedding. (24) $\begin{matrix} f_{i}^{(0)} = W^{(0)} x_{i} + b^{(0)} \end{matrix}$ (24) where $W^{(0)}$ and $b^{(0)}$ are learnable parameters. The final static embedding is generated by passing $f_{i}^{(0)}$ through $N$ layers of the attention layers. The output of each layer depends on the input of the previous layer, and the calculation for the $k_{t h}$ layer is as follows: (25) $\begin{matrix} \hat{f} = B N (f^{(k - 1)} + M H A (f^{(k - 1)})) \end{matrix}$ (25) (26) $\begin{matrix} f^{(k)} = B N (\hat{f} + F F (\hat{f})) \end{matrix}$ (26) After $N$ attention layers, the final static embedding is $f^{(N)} .$

We then introduce dynamic coverage information as dynamic features. We propose two types of handling of coverage information. The first one is that if a node is selected or covered, it will not be selected in subsequent operations, and this idea is easy to understand and implement. If a node is covered but not selected, it still has the chance to be selected. For the second approach, we introduced a dynamic coverage vector $I_{i}$ for each point $i$ to represent its coverage status, and dynamically updated $I_{i}$ in each step.

We used $π_{t}$ to denote the node selected in step $t$ , $C_{r} (π_{t})$ to denote the set of nodes covered by the node $π_{t}$ under coverage radius $r$ , $N_{C_{r (π_{t})}}$ to represent the number of nodes in the set $C_{r} (π_{t}),$ sorted each point in $C_{r} (π_{t})$ according to its distance to $π_{t}$ from near to far, and $c_{i}$ to express the sort index of the $i_{t h}$ point. The nearest sort to $π_{t}$ is 1 and the farthest point is sorted as $N_{C_{r (π_{t})}}$ .

First, for each point, $I_{i}$ initialized as 1: (27) $\begin{matrix} I_{i} = 1, i = 1, \dots, n \end{matrix}$ (27) In each step of the decoding process, $I_{i}$ is dynamically updated according to Equation (28) or Equation (29): (28) $\begin{matrix} I_{i} = I_{i} \times \frac{c_{i}}{N_{C_{r (π_{t})}}}, i = 1, \dots, n . \end{matrix}$ (28) (29) $\begin{matrix} I_{i} = {\begin{matrix} 0, if i \in C_{r} (π_{t}) \\ I_{i}, otherwise \end{matrix} \end{matrix}$ (29) According to the above equation, if node $i$ is covered several times, $I_{i}$ is reduced, which indicates that the probability of this selected node is reduced. Furthermore, the designed dynamic coverage vector contains coverage information for each node. A node's coverage and distance from the central node can serve as a useful guide for selecting the next node. Then we let the model learn how to use $I_{i}$ to adjust probabilities. $I_{i}$ is updated according to the following equation: (30) $\begin{matrix} H_{i} = I_{i} W^{I} \end{matrix}$ (30) where $W^{I}$ is the learnable parameter, and $H_{i}$ is the final dynamic embedding.

Hence, according to static and dynamic embedding, we constructed the key of each node as the input to the decoder, and updated the formula as follows: (31) $\begin{matrix} k_{i} = W^{k} f_{i}^{(N)} H_{i}, i = 1, \dots, n . \end{matrix}$ (31) where $k_{i}$ denotes the key of node $i$ , and $W^{k}$ is the learnable parameter. The model can capture environmental information and learn better strategies by adding dynamic coverage information.

4.2.2. Decoder

We designed a learnable decoder to select the facilities and construct feasible solutions based on the attention mechanism. The Decoder consists of two parts: the RNN for outputting the query, and the attention structure for calculating the probability of each city being selected.

The decoding process operates sequentially, and the decoder outputs a facility at each step $t .$ This decision was based on the static and dynamic embeddings of the nodes, which were acquired from the encoder, as well as the current state of the partial solution. For each node $i = 1, \dots, n$ , as in the study by Li et al. (Citation2021), the query $q_{t}$ is generated by using the Decoder through the RNN and self-attention, and the vector $u_{i}$ of each node is computed by combining the $k_{i}$ obtained from the encoder: (32) $\begin{matrix} u_{i}^{t} = \frac{q_{t}^{T} k_{i}}{\sqrt{d_{k}}} \end{matrix}$ (32) where $\sqrt{d_{k}}$ is called the scaling factor, $d_{k} = \frac{d_{h}}{M},$ and M denotes the number of self-attentions in the multi-headed attention. Then, $u_{i}$ was modified according to the coverage information to obtain ${\bar{u}}_{i} :$ (33) $\begin{matrix} {\bar{u}}_{i}^{t} = {\begin{matrix} - \infty, if i \in π_{1 \sim t - 1} \\ \frac{q_{t}^{T} k_{i}}{\sqrt{d_{k}}}, otherwise \end{matrix} \end{matrix}$ (33)

Finally, we used softmax to calculate the probability of each node being selected. (34) $\begin{matrix} p_{i} = \frac{e^{{\bar{u}}_{i}^{t}}}{\sum_{j = 1}^{n} e^{{\bar{u}}_{i}^{t}}}, i = 1, \dots, n . \end{matrix}$ (34) Here, we applied a greedy strategy to select nodes, that is, we chose the node with the highest probability as the location of the currently deployed facility.

4.3. Training of the model

Given instance $s$ , and the model defined in Equation (20), the solution $π$ for the problem was obtained from the samples. Our goal is to choose a point at each step to deploy the facility and try to maximize or minimize the objective function. Thus, a reinforcement learning approach is utilized to train the model. Commonly, given the triple $(s, a, r),$ the parameters of the model can be updated by gradient descent using a reinforcement learning algorithm with baseline $b (s)$ (Williams Citation1992): (35) $\begin{matrix} \nabla L (θ | s) = E_{p θ (π | s)} [(L (π) - b (s)) \nabla \log p θ (π | s)] \end{matrix}$ (35) The algorithm for solving problems using reinforcement learning is as follows.

Table

Display Table

5. Experiments and results

We present the benchmark experimental results of SpoNet for solving the p-Median, p-Center, and MCLP. First, instances with 20, 50, and 100 points were generated to train the models. The two-dimensional coordinates $(x, y)$ of each point are independently sampled in a uniform distribution of $[0, 1],$ and the weight of every pair of points is calculated using the Euclidean distance. All the instances are complete graphs, that is, each point in the graph is directly connected to other points by edges. Therefore, the covering radius was added to mask the covered points to realize dynamic updates and improve the performance of the model. The choice of the covering radius is based on experience. In the MCLP, the covering radius is set to 0.3, 0.2, and 0.15, for problems of 20, 50, and 100 points, respectively. The covering radius of facilities in the original p-Median and p-Center problems is infinite by default, and the optimization objective of the p-Center problem is to find the minimum covering radius. Therefore, we chose the approximate upper bound of the p-Center target value as the reference. For the problems with 20, 50, and 100 points, the covering radius was set to 0.32, 0.24, and 0.16, respectively. All experiments were implemented using Python and run on two RTX3090 GPUs and an Intel(R) Xeon(R) Gold 6230 CPU @2.10 GHz. The code has been released https://github.com/HIGISX/SpoNet.

Hyperparameters: For all weighted parameters in a neural network, a uniform initialization $U \sim (- \frac{1}{d}, \frac{1}{d})$ is used, where $d$ denotes the dimension of the input. During the training of each model, 250 epochs were trained, and each epoch processed 250 batches with a batch size of 512. For each instance, the two-dimensional position coordinates of the nodes are linearly projected onto 128-dimensional vectors and then sent to the encoder. After three multi-head attention layers, a 128-dimensional hidden embedding was obtained. The models were trained using the ADAM optimizer (Kingma and Ba Citation2014) and the learning rate was fixed at $10^{- 4}$ . lists the hyperparameter configurations used during the training process.

Table 1. Hyperparameter configurations.

Download CSV Display Table

5.1. Main results

5.1.1. Comparison analysis

We compared our SpoNet with three types of methods:(1) three state-of-the-art solvers for spatial optimization problems, including Mathematical Programming Solver, Gurobi (Citation2021), Cplex (Manual Citation1987) and COPT; (2) the deep model baseline, Attention Model (AM), a constructive method based on DRL for solving TSP and CVRP, which we transferred to solve p-Median, p-Center, and MCLP; (3) heuristic and meta-heuristic algorithms, including GA, VNS, SA, TS, and Teitz-Bart. We implemented SA, VNS, and Teitz-Bart for the p-Median problem; SA, TS, and GA for the p-Center problem; and SA, VNS, and GA for the MCLP. To perform the test, 10,000 instances with the same distribution as that of the training data were generated for each group of experiments. We used the objective value, optimality gap, and runtime for the evaluation.

We compare the performance of our SpoNet and all the methods mentioned above for the three problems with different problem sizes in . Each evaluation indicator was recorded, with an average value of 10,000 instances. indicates that SpoNet with sample 1280 is superior to heuristic algorithms in terms of the performance of the objective value and running time. Although the AM with sample 1280 consumes less time than our model, it is acceptable to sacrifice a little time to improve the accuracy. Because heuristic algorithms need to adjust the parameters according to expert experience, the solution accuracy and running time are affected by the parameters. Within a certain range, the longer the solution time, the smaller the gap between the target value and the optimal solution. Here, we present the value of the objective function for the problem within an acceptable solution time.

Table 2. The solution result of p-Median, p-Center, and MCLP with different scale.

Download CSV Display Table

5.1.2. Efficiency comparison for MCLP

Among all the methods, mathematical programming solvers can provide the optimal solution for the p-Median, p-Center, and MCLP. However, these methods require longer resolution times. With the increase in the scale of the problem size, it is difficult to solve the problem in a short time. We conducted a simple experiment to explain the advantages of the deep model in terms of running time. For MCLP, we expanded the number of demand points to 1000, 2000, and 5000 and displayed the results of the trained SpoNet (greedy) and Gurobi. From , as the problem scale increases, our model can obtain a flexible solution quickly, but Gurobi cannot report a solution with 5000 demand points in an hour.

Table 3. The results on N = 1000, 2000 and 5000 of MCLP.

Download CSV Display Table

5.1.3. Analysis of decoding strategy

In the evaluation experiment, two decoding strategies were adopted. The first is a greedy decoding strategy; that is, the facility point is chosen based on which node has the highest probability. The second is the sampling decoding strategy. We sampled 1280 solutions and selected the best solution reports. Our model can use GPU operation, and sampling 1280 solutions can be completed in parallel. Therefore, it only takes a short time to obtain more accurate solutions. The experiment showed that the sampling decoding strategy usually obtains a more feasible solution than the greedy decoding strategy.

5.2. Ablation study

In this section, we illustrate the ablation study to prove the effectiveness of the dynamic covering information and the GRU structure.

5.2.1. Effects of dynamic covering information

In our study, we consider two updated formulations based on coverage information. The first is to remove all covered nodes from the set of candidate facility points according to the coverage radius after selecting facility points. The update formula is as follows Equation (28). Even after a node has been marked as covered, the node is still eligible to be chosen as a facility point. However, the likelihood of it being selected is modified based on the distance between the covered node and the potential facility point. The updated formula is shown in Equation (29): shows the Obj, Gaps, and Time of the AM before and after adding dynamic coverage information to solve the three types of problems. As shown, a structure with dynamic coverage information has a smaller gap than the original attention model.

Table 4. Effect of dynamic covering information.

Download CSV Display Table

5.2.2. Effect of GRU structure

The process of constructing solutions is a long time series of the decision-making process, and the GRU structure can effectively deal with this type of problem with a long dependence. shows the results of the three problems on different scales before and after the GRU structure was added.

Table 5. Effect of GRU structure.

Download CSV Display Table

5.3. Transfer study

The approach that utilizes a deep learning model to construct solutions has the advantage of being able to handle large-scale problems adaptively. We used the trained model with 100 nodes to evaluate the instances of 200, 300, 400, 500, 800, and 1000 nodes, and15 nodes were finally chosen to deploy facilities. We utilized the Gurobi solver (the solution time was limited to 600 s for each instance) to report the optimal solution, used SpoNet, and simulated annealing algorithm (better than GA and VNS) to provide a feasible solution for the problem. presents the results: (1) SpoNet performs better than SA on all issues; and (2) for large-scale problems, Gurobi and SA are time-consuming, but SpoNet can still provide a feasible solution to the problem quickly, and the output is closer to the best possible solution.

Table 6. Transfer study.

Download CSV Display Table

5.4. Visualization analysis

To intuitively display the solution results, we randomly generated an example for each of the three problems and used the Gurobi solver, simulated annealing algorithm, and SpoNet (sample1280) to provide a feasible solution to the problem and visualize the final solution results. illustrate the differences in performance among the three algorithms for p-Median, p-Center, and MCLP, respectively. SpoNet showed better performance than SA. Although there is a certain gap between it and the optimal solution provided by the Gurobi solver, we achieved significant time savings, which is more obvious in more complex problems.

Figure 3. Visualizing solution produced by Gurobi, SpoNet, and SA algorithms for p-Median on n = 100, p = 15.

Figure 4. Visualizing solution produced by Gurobi, SpoNet, and SA algorithms for p-Center on n = 100, p = 15.

Figure 5. Visualizing solution produced by Gurobi, SpoNet, and SA algorithms for MCLP on n = 100, p = 15, r = 0.15.

5.5. Experiments on real scenarios

In this part, we applied SpoNet to a real-world case study to address a critical urban planning challenge. It optimizes the location of emergency facilities in Beijing’s Chaoyang District. The aim is to ensure efficient emergency response and coverage for the district’s residents while selecting a limited number of central hubs.

The dataset comprises a total of 132 emergency facilities. We aim to determine the locations for 20 central hub points out of the 132 available options. We define the maximum service distance for each selected central hub facility as 2000 m. Before conducting any analysis, the dataset underwent essential preprocessing, including coordinate projection and normalization. These steps are crucial to ensure the compatibility of the data with the SpoNet framework and its application to urban planning and emergency response scenarios. presents a visual representation of the input data and a comparative analysis.

Figure 6. Visualizing solution produced by Gurobi, SA, and SpoNet for the location of emergency facilities.

This simple yet illustrative example highlights the adaptability of our SpoNet model to real-world problem-solving scenarios, underscoring its vast potential for future research and practical applications. In contrast to mature solvers like Gurobi, which require intricate mathematical formulations, our approach is data-driven and learns how to make decisions through interactions with the environment. We present a novel approach to spatial decision analysis, offering an insightful perspective on problem-solving in complex spatial domains.

6. Conclusion

In conclusion, our work introduced a novel approach to tackle FLPs. We proposed a unified structure, SpoNet. It contains the dynamic covering information to reduce the solution space. It also adopts the GRU structure to manage the dependencies in the long-term sequence by modeling the three facility location problems as MDP and using the RL to train models. Ablation studies and experiments indicated that dynamic coverage information and the GRU structure can effectively enhance the performance of the model. Compared with other modern solvers, there is still a certain gap in optimality. However, SpoNet is suitable for large-scale problems and situations that require fast decision-making.

In future work, we will incorporate new algorithms based on a deep reinforcement learning paradigm to solve additional classical problems. We believe that this approach has great potential and can lead to significant progress in urban spatial decision analysis. We will continue exploring the application of deep learning and reinforcement learning in practical decision support systems, to provide innovative solutions for urban planning and decision-making. Many more challenges and opportunities await as we continue to push the boundaries in advancing this field. Through our ongoing research, we aim to enhance urban sustainability, improve decision accuracy, and create more intelligent urban lifestyles.

Data and codes availability statement

The data and codes supporting the findings of this study are available at https://github.com/HIGISX/SpoNet.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research were financially supported by the innovation group project of the Key Laboratory of Remote Sensing and Digital Earth Chinese Academy of Sciences (E33D0201-5), CBAS project 2023, and the Beijing Chaoyang District Collaborative Innovation Project (E2DZ050100).

References

Alcaraz, J., M. Landete, and J. F. Monge. 2012. “Design and Analysis of Hybrid Metaheuristics for the Reliability p-Median Problem.” European Journal of Operational Research 222 (1): 54–64. https://doi.org/10.1016/j.ejor.2012.04.016.
Web of Science ®Google Scholar
Andronie, M., G. Lăzăroiu, M. Iatagan, I. Hurloiu, and I. Dijmărescu. 2021. “Sustainable Cyber-physical Production Systems in Big Data-driven Smart Urban Economy: A Systematic Literature Review.” Sustainability 13 (2): 751. https://doi.org/10.3390/su13020751.
Web of Science ®Google Scholar
Beairsto, J., Y. Tian, L. Zheng, Q. Zhao, and J. Hong. 2022. “Identifying Locations for New Bike-sharing Stations in Glasgow: An Analysis of Spatial Equity and Demand Factors.” Annals of GIS 28 (2): 111–126. https://doi.org/10.1080/19475683.2021.1936172.
Web of Science ®Google Scholar
Beheshti, Z., and S. M. H. Shamsuddin. 2013. “A Review of Population-based Meta-heuristic Algorithms.” International Journal of Advances in Soft Computing and its Applications 5 (1): 1–35.
Google Scholar
Berman, O., and D. Krass. 2002. “The Generalized Maximal Covering Location Problem.” Computers & Operations Research 29 (6): 563–581. https://doi.org/10.1016/S0305-0548(01)00079-X.
Web of Science ®Google Scholar
Bleha, B., and P. Ďurček. 2023. “Unambiguous Linkage Between the Vaccination Coverage and the Spread of COVID-19: Geostatistical Evidence from the Slovak LAU 1 Regions.” Journal of Geovisualization and Spatial Analysis 7 (1): 14. https://doi.org/10.1007/s41651-023-00144-2.
Google Scholar
Chung, J., C. Gulcehre, K. Cho, and Y. Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” arXiv preprint arXiv:1412.3555.
Google Scholar
Church, R., and C. R. Velle. 1974. “The Maximal Covering Location Problem.” Papers in Regional Science 32 (1): 101–118. https://doi.org/10.1111/j.1435-5597.1974.tb00902.x.
Google Scholar
Daskin, M. 1997. “Network and Discrete Location: Models, Algorithms and Applications.” Journal of the Operational Research Society 48 (7): 763–764. https://doi.org/10.1057/palgrave.jors.2600828.
Web of Science ®Google Scholar
Deudon, M., P. Cournut, A. Lacoste, Y. Adulyasak, and L. M. Rousseau. 2018. “Learning Heuristics for the tsp by Policy Gradient.” In Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 15th International Conference, CPAIOR 2018, Delft, The Netherlands, June 26–29, 2018, Proceedings 15, 170–181. Springer International Publishing.
Google Scholar
Domínguez, E., and J. Muñoz. 2008. “A Neural Model for the p-Median Problem.” Computers & Operations Research 35 (2): 404–416. https://doi.org/10.1016/j.cor.2006.03.005.
Web of Science ®Google Scholar
Gao, F., S. Li, Z. Tan, and S. Liao. 2022. “Visualizing the Spatiotemporal Characteristics of Dockless Bike Sharing Usage in Shenzhen, China.” Journal of Geovisualization and Spatial Analysis 6 (1): 12. https://doi.org/10.1007/s41651-022-00107-z.
Google Scholar
Guo, H., D. Liang, Z. Sun, F. Chen, X. Wang, J. Li, J. Bian, et al. 2022. “Measuring and Evaluating SDG Indicators with Big Earth Data.” Science Bulletin 67 (17): 1792–1801. https://doi.org/10.1016/j.scib.2022.07.015.
PubMed Web of Science ®Google Scholar
Gurobi Optimization, LLC. 2021. “Gurobi Optimizer Reference Manual.”
Google Scholar
Hakimi, S. L. 1964. “Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph.” Operations Research 12 (3): 450–459. https://doi.org/10.1287/opre.12.3.450.
Web of Science ®Google Scholar
Hakimi, S. L. 1965. “Optimum Distribution of Switching Centers in a Communication Network and Some Related Graph Theoretic Problems.” Operations Research 13 (3): 462–475. https://doi.org/10.1287/opre.13.3.462.
Web of Science ®Google Scholar
Hochreiter, S., and J. Schmidhuber. 1997. “Long Short-term Memory.” Neural Computation 9 (8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
PubMed Web of Science ®Google Scholar
Huang, X., Z. Li, Y. Jiang, X. Ye, C. Deng, J. Zhang, and X. Li. 2021. “The Characteristics of Multi-source Mobility Datasets and How they Reveal the Luxury Nature of Social Distancing in the US during the COVID-19 Pandemic.” International Journal of Digital Earth 14 (4): 424–442. https://doi.org/10.1080/17538947.2021.1886358.
Web of Science ®Google Scholar
Jiang, H., H. Guo, Z. Sun, Q. Xing, H. Zhang, Y. Ma, and S. Li. 2022. “Projections of Urban Built-up Area Expansion and Urbanization Sustainability in China's Cities through 2030.” Journal of Cleaner Production 367:133086. https://doi.org/10.1016/j.jclepro.2022.133086.
Web of Science ®Google Scholar
Kaveh, A., and M. Khayatazad. 2012. “A New Meta-heuristic Method: Ray Optimization.” Computers & Structures 112–113:283–294. https://doi.org/10.1016/j.compstruc.2012.09.003.
Web of Science ®Google Scholar
Kazakovtsev, L. A., and A. A. Stupina. 2014. “Fast Genetic Algorithm with Greedy Heuristic for p-Median and k-Means Problems.” In 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), 602–606. IEEE.
Google Scholar
Kingma, D. P., and J. Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv preprint arXiv:1412.6980.
Google Scholar
Kool, W., H. van Hoof, and M. Welling. 2018. “Attention, Learn to Solve Routing Problems!” International Conference on Learning Representations.
Google Scholar
Kwon, Y. D., J. Choo, B. Kim, I. Yoon, Y. Gwon, and S. Min. 2020. “Pomo: Policy Optimization with Multiple Optima for Reinforcement Learning.” Advances in Neural Information Processing Systems 33:21188–21198.
Google Scholar
Lăzăroiu, G., L. Ionescu, M. Andronie, and I. Dijmărescu. 2020. “Sustainability Management and Performance in the Urban Corporate Economy: A Systematic Literature Review.” Sustainability 12 (18): 7705. https://doi.org/10.3390/su12187705.
Web of Science ®Google Scholar
Li, X., X. Li, and X. Ma. 2022. “Spatial Optimization for Urban Green Space (UGS) Planning Support Using a Heuristic Approach.” Applied Geography 138:102622. https://doi.org/10.1016/j.apgeog.2021.102622.
Web of Science ®Google Scholar
Li, K., T. Zhang, R. Wang, Y. Wang, Y. Han, and L. Wang. 2021. “Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems.” IEEE Transactions on Cybernetics 52 (12): 13142–13155. https://doi.org/10.1109/TCYB.2021.3103811.
Web of Science ®Google Scholar
Liang, H., S. Wang, H. Li, H. Ye, and Y. Zhong. 2022. “A Trade-Off Algorithm for Solving p-Center Problems with a Graph Convolutional Network.” ISPRS International Journal of Geo-Information 11 (5): 270. https://doi.org/10.3390/ijgi11050270.
Web of Science ®Google Scholar
Manual, C. U. S. 1987. “Ibm Ilog Cplex Optimization Studio.” Version 12 (1987–2018): 1.
Google Scholar
Mazyavkina, N., S. Sviridov, S. Ivanov, and E. Burnaev. 2021. “Reinforcement Learning for Combinatorial Optimization: A Survey.” Computers & Operations Research 134:105400. https://doi.org/10.1016/j.cor.2021.105400.
Web of Science ®Google Scholar
Mecheter, I., M. Abbod, A. Amira, and H. Zaidi. 2022. “Deep Learning with Multiresolution Handcrafted Features for Brain MRI Segmentation.” Artificial Intelligence in Medicine 131:102365. https://doi.org/10.1016/j.artmed.2022.102365.
PubMed Web of Science ®Google Scholar
Nica, E., G. H. Popescu, M. Poliak, T. Kliestik, and O.-M. Sabie. 2023. “Digital Twin Simulation Tools, Spatial Cognition Algorithms, and Multi-sensor Fusion Technology in Sustainable Urban Governance Networks.” Mathematics 11:1981. https://doi.org/10.3390/math11091981.
Web of Science ®Google Scholar
Peng, B., J. Wang, and Z. Zhang. 2020. “A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems.” In Artificial Intelligence Algorithms and Applications: 11th International Symposium, ISICA 2019, Guangzhou, People’s Republic of China, November 16–17, 2019, Revised Selected Papers 11, 636–650. Springer Singapore.
Google Scholar
Pirkul, H., and D. Schilling. 1989. “The Capacitated Maximal Covering Location Problem with Backup Service.” Annals of Operations Research 18 (1): 141–154. https://doi.org/10.1007/BF02097800.
Google Scholar
Rabiei-Dastjerdi, H., G. McArdle, S. A. Matthews, and P. Keenan. 2021. “Gap Analysis in Decision Support Systems for Real-estate in the era of the Digital Earth.” International Journal of Digital Earth 14 (1): 121–138. https://doi.org/10.1080/17538947.2020.1808719.
Web of Science ®Google Scholar
ReVelle, C., M. Scholssberg, and J. Williams. 2008. “Solving the Maximal Covering Location Problem with Heuristic Concentration.” Computers & Operations Research 35 (2): 427–435. https://doi.org/10.1016/j.cor.2006.03.007.
Web of Science ®Google Scholar
ReVelle, C. S., and R. W. Swain. 1970. “Central Facilities Location.” Geographical Analysis 2 (1): 30–42. https://doi.org/10.1111/j.1538-4632.1970.tb00142.x.
Web of Science ®Google Scholar
Song, M., and D. Chen. 2018. “A Comparison of Three Heuristic Optimization Algorithms for Solving the Multi-objective Land Allocation (MOLA) Problem.” Annals of GIS 24 (1): 19–31. https://doi.org/10.1080/19475683.2018.1424736.
Web of Science ®Google Scholar
Teitz, M. B., and P. Bart. 1968. “Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph.” Operations Research 16 (5): 955–961. https://doi.org/10.1287/opre.16.5.955.
Web of Science ®Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Łukasz Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” Advances in Neural Information Processing Systems, 30.
Google Scholar
Vinyals, O., M. Fortunato, and N. Jaitly. 2015. “Pointer Networks.” In Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2 (pp. 2692–2700).
Google Scholar
Wang, S., H. Liang, Y. Zhong, X. Zhang, and C. Su. 2023. “DeepMCLP: Solving the MCLP with Deep Reinforcement Learning for Urban Facility Location Analytics.” https://doi.org/10.25436/E2KK5V.
Google Scholar
Williams, R. J. 1992. “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning.” Reinforcement Learning, 5–32. https://doi.org/10.1007/978-1-4615-3618-5_2.
Google Scholar
Williamson, D. P., and D. B. Shmoys. 2011. The Design of Approximation Algorithms. Cambridge university press.
Google Scholar
Xia, Y., H. Chen, C. Zuo, and N. Zhang. 2022. “The Impact of Traffic on Equality of Urban Healthcare Service Accessibility: A Case Study in Wuhan, China.” Sustainable Cities and Society 86:104130. https://doi.org/10.1016/j.scs.2022.104130.
Web of Science ®Google Scholar
Xiao, X., D. Xu, and W. Wan. 2016. “Overview: Video Recognition from Handcrafted Method to Deep Learning Method.” In 2016 International Conference on Audio, Language and Image Processing (ICALIP), 646–651. IEEE.
Google Scholar
Xin, L., W. Song, Z. Cao, and J. Zhang. 2021. “Multi-decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems.” Proceedings of the AAAI Conference on Artificial Intelligence 35 (13): 12042–12049. https://doi.org/10.1609/aaai.v35i13.17430.
Google Scholar
Zhang, Z., H. Hu, D. Yin, S. Kashem, R. Li, H. Cai, D. Perkins, and S. Wang. 2020. “A cyberGIS-Enabled Multi-Criteria Spatial Decision Support System: A Case Study on Flood Emergency Management.” International Journal of Digital Earth, 167–184. https://doi.org/10.1080/17538947.2018.1543363.
Web of Science ®Google Scholar
Zhao, Q., D. J. Sailor, and E. A. Wentz. 2018. “Impact of Tree Locations and Arrangements on Outdoor Microclimates and Human Thermal Comfort in an Urban Residential Environment.” Urban Forestry & Urban Greening 32: 81–91. https://doi.org/10.1016/j.ufug.2018.03.022.
Web of Science ®Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Sponet: solve spatial optimization problem using deep reinforcement learning for urban spatial decision analysis

ABSTRACT

1. Introduction

2. Related work

2.1. Urban facility location problems

2.2. Deep learning for spatial optimization problems

3. Preliminaries

3.1. Problem definition

3.1.1. p-Median problem

3.1.2. p-Center problem

3.1.3. MCLP

3.2. Deep learning model

3.2.1. Attention model

3.2.2. Gate recurrent unit

3.3. Markov decision process and deep reinforcement learning

3.3.1. Markov decision process

3.3.2. Deep reinforcement learning

4. Methodology

4.1. Reformulate as MDP form

4.2. The SpoNet

4.2.1. Encoder

4.2.2. Decoder

4.3. Training of the model

5. Experiments and results

Table 1. Hyperparameter configurations.

5.1. Main results

5.1.1. Comparison analysis

Table 2. The solution result of p-Median, p-Center, and MCLP with different scale.

5.1.2. Efficiency comparison for MCLP

Table 3. The results on N = 1000, 2000 and 5000 of MCLP.

5.1.3. Analysis of decoding strategy

5.2. Ablation study

5.2.1. Effects of dynamic covering information

Table 4. Effect of dynamic covering information.

5.2.2. Effect of GRU structure

Table 5. Effect of GRU structure.

5.3. Transfer study

Table 6. Transfer study.

5.4. Visualization analysis

5.5. Experiments on real scenarios

6. Conclusion

Data and codes availability statement

Disclosure statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date