185
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

ORCID Icon
Article: 2291201 | Received 15 Nov 2022, Accepted 28 Nov 2023, Published online: 20 Dec 2023
 

Abstract

We address a less model-based dynamic routing problem arising from home parcel pick-up service, where less model-based means existing customers who dynamically request services independently following Poisson process with a stochastic rate. Overall, through an extended application of re-optimization (RO) strategy, a Markov decision process formulation and approximation dynamic programming- and Bayes' theorem-based solution approaches are proposed. Specifically, first a pool of basic policies corresponding to all possible values of the rate are developed offline via approximate value iteration. Then, Bayes' theorem-based sequential learning is designed that can sequentially update the belief about the rate's probability distribution over its possible values. Third, coupled with the updated belief, basic policies are collectively implemented in two different ways, resulting in RO approaches which involve constructions of two different online policies, i.e. a belief-weighted deterministic policy and a belief-based random policy, and their re-optimizations at decision epochs. In the numerical study, through comparison with model-based (i.e. using full knowledge of the rate) and model-free heuristics, our approaches are examined and valuable insights are obtained. Important insights include that (i) the belief-weighted deterministic policy outperforms the belief-based random policy, and further, (ii) the former is better than the latter at preserving the improvement resulting from the improved model-based policy.

Disclosure statement

The authors confirm that the data supporting the findings of this study are available within the article.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Notes

1 For a collection of customers S, ρ(S)=iSρ(i).

2 For the definition of ‘heavy traffic (load)’, please see Bertsimas and Van Ryzin (Citation1991); Bullo et al. (Citation2011).

3 There vπ0 is a constant initial value, α is a constant step size, and Ω is the set of all possible realizations of stochastic events.

4 Note that M is a big number, the bigger the better, and it mainly depends on available computation power.

5 Note that here ‘k’ represents knowledge.

6 Please see (Equation4a)–(Equation4b) for detailed description about a and Sk.

7 In this paper, ε is set to be 0.5.

8 In Ulmer et al. (Citation2018), customers are distinguished by early request customers (ERCs) and late request customers (LRCs), and the degree of dynamism is the ratio of the number of LRCs to the total number of ERCs and LRCs. There the expected total number of ERCs and LRCs is set to be 100 and the degree of dynamism is set to be 0.5 and 0.75, respectively. Clearly, ERCs and LRSs correspond to P-customers and M-customers in this paper, respectively.

Additional information

Funding

This work is funded by Fundamental Research Funds for the Central Universities of China [No. A0920502052001-214].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.