Views

CrossRef citations to date

Altmetric

Research Article

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

Yu WUa School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, Sichuan, People's Republic of China;b National United Engineering Laboratory of Integrated and Intelligent Transportation, Chengdu, People's Republic of China;c National Engineering Laboratory of Big Data Application in Integrated Transportation, Chengdu, People's Republic of ChinaCorrespondence[email protected]

https://orcid.org/0000-0001-5113-6563 View further author information

Abstract

We address a less model-based dynamic routing problem arising from home parcel pick-up service, where less model-based means existing customers who dynamically request services independently following Poisson process with a stochastic rate. Overall, through an extended application of re-optimization (RO) strategy, a Markov decision process formulation and approximation dynamic programming- and Bayes' theorem-based solution approaches are proposed. Specifically, first a pool of basic policies corresponding to all possible values of the rate are developed offline via approximate value iteration. Then, Bayes' theorem-based sequential learning is designed that can sequentially update the belief about the rate's probability distribution over its possible values. Third, coupled with the updated belief, basic policies are collectively implemented in two different ways, resulting in RO approaches which involve constructions of two different online policies, i.e. a belief-weighted deterministic policy and a belief-based random policy, and their re-optimizations at decision epochs. In the numerical study, through comparison with model-based (i.e. using full knowledge of the rate) and model-free heuristics, our approaches are examined and valuable insights are obtained. Important insights include that (i) the belief-weighted deterministic policy outperforms the belief-based random policy, and further, (ii) the former is better than the latter at preserving the improvement resulting from the improved model-based policy.

Keywords:

Disclosure statement

The authors confirm that the data supporting the findings of this study are available within the article.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Notes

1 For a collection of customers $S$ , $ρ (S) = \sum_{i \in S} ρ (i)$ .

2 For the definition of ‘heavy traffic (load)’, please see Bertsimas and Van Ryzin (Citation1991); Bullo et al. (Citation2011).

3 There $v^{π_{0}}$ is a constant initial value, α is a constant step size, and Ω is the set of all possible realizations of stochastic events.

4 Note that M is a big number, the bigger the better, and it mainly depends on available computation power.

5 Note that here ‘k’ represents knowledge.

6 Please see (Equation4a(4a) $\begin{aligned} X (s_{k}) = {(a \in {0, 1}^{| S_{k} |}, l) | i f c_{k} \geq ρ (W_{k} \cup S_{k}), \\ a = {1}^{| S_{k} |}, e l s e a \neq {1}^{| S_{k} |}; \end{aligned}$ (4a) )–(Equation4b(4b) $\begin{aligned} l \in W_{k} \cup S_{k}^{'}, ρ (W_{k} \cup S_{k}^{'}) \leq c_{k}, S_{k}^{'} = {i \in S_{k} : a_{i} = 1}} \end{aligned}$ (4b) ) for detailed description about $a$ and $S_{k}^{'}$ .

7 In this paper, ε is set to be 0.5.

8 In Ulmer et al. (Citation2018), customers are distinguished by early request customers (ERCs) and late request customers (LRCs), and the degree of dynamism is the ratio of the number of LRCs to the total number of ERCs and LRCs. There the expected total number of ERCs and LRCs is set to be 100 and the degree of dynamism is set to be 0.5 and 0.75, respectively. Clearly, ERCs and LRSs correspond to P-customers and M-customers in this paper, respectively.

Additional information

Funding

This work is funded by Fundamental Research Funds for the Central Universities of China [No. A0920502052001-214].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

Information for

Open access

Opportunities

Help and information

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

Abstract

Disclosure statement

Correction Statement

Notes

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature