ABSTRACT
This research provides a method that accelerates learning and avoids local minima to improve the policy gradient algorithm’s learning process. Reinforcement learning has the advantage of not requiring a model. Consequently, it can improve control performance, mainly when a model is generally unavailable, such as when an error occurs. The proposed method efficiently and expeditiously investigates the action space. First, it quantifies the resemblance between agents’ and traditional controllers’ actions. Then, the principal reward function is modified to reflect this similarity. This reward-shaping mechanism guides the agent to maximize its return via an attractive force during the gradient ascent. To validate our concept, we establish a satellite attitude control environment with a similarity subsystem. The outcomes demonstrate the effectiveness and robustness of our method.
Acknowledgements
This work was supported by: Direction Générale de la Recherche Scientifique et du Développement Technologique, DGRSDT, Algeria.
Disclosure statement
No potential conflict of interest was reported by the author (s).
Data and code availability
Considering that the case study in this paper is related to real space mission, data and code will be made available upon reasonable request to the corresponding author.