185
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

ORCID Icon
Article: 2291201 | Received 15 Nov 2022, Accepted 28 Nov 2023, Published online: 20 Dec 2023
 

Abstract

We address a less model-based dynamic routing problem arising from home parcel pick-up service, where less model-based means existing customers who dynamically request services independently following Poisson process with a stochastic rate. Overall, through an extended application of re-optimization (RO) strategy, a Markov decision process formulation and approximation dynamic programming- and Bayes' theorem-based solution approaches are proposed. Specifically, first a pool of basic policies corresponding to all possible values of the rate are developed offline via approximate value iteration. Then, Bayes' theorem-based sequential learning is designed that can sequentially update the belief about the rate's probability distribution over its possible values. Third, coupled with the updated belief, basic policies are collectively implemented in two different ways, resulting in RO approaches which involve constructions of two different online policies, i.e. a belief-weighted deterministic policy and a belief-based random policy, and their re-optimizations at decision epochs. In the numerical study, through comparison with model-based (i.e. using full knowledge of the rate) and model-free heuristics, our approaches are examined and valuable insights are obtained. Important insights include that (i) the belief-weighted deterministic policy outperforms the belief-based random policy, and further, (ii) the former is better than the latter at preserving the improvement resulting from the improved model-based policy.

Disclosure statement

The authors confirm that the data supporting the findings of this study are available within the article.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Notes

1 For a collection of customers S, ρ(S)=iSρ(i).

2 For the definition of ‘heavy traffic (load)’, please see Bertsimas and Van Ryzin (Citation1991); Bullo et al. (Citation2011).

3 There vπ0 is a constant initial value, α is a constant step size, and Ω is the set of all possible realizations of stochastic events.

4 Note that M is a big number, the bigger the better, and it mainly depends on available computation power.

5 Note that here ‘k’ represents knowledge.

6 Please see (Equation4a)–(Equation4b) for detailed description about a and Sk.

7 In this paper, ε is set to be 0.5.

8 In Ulmer et al. (Citation2018), customers are distinguished by early request customers (ERCs) and late request customers (LRCs), and the degree of dynamism is the ratio of the number of LRCs to the total number of ERCs and LRCs. There the expected total number of ERCs and LRCs is set to be 100 and the degree of dynamism is set to be 0.5 and 0.75, respectively. Clearly, ERCs and LRSs correspond to P-customers and M-customers in this paper, respectively.

Additional information

Funding

This work is funded by Fundamental Research Funds for the Central Universities of China [No. A0920502052001-214].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,413.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.