Abstract
Reinforcement Learning (RL) has recently seen significant advances over the last decade in simulated and controlled environments. RL has shown impressive results in difficult decision-making problems such as playing video games or controlling robot arms, especially in industrial applications where most methods require many interactions with the system in order to achieve good performance, which can be costly and time-consuming. Model-Based Reinforcement Learning (MBRL) promises to close this gap by leveraging learned environment models and using them for data generation and/or planning and, at the same time trying to be sample efficient. However, Learning with sparse rewards remains a significant challenge in the field of RL. In order to promote efficient learning the sparsity of rewards must be addressed. This thesis work tries to study individual components of MBRL algorithms under sparse reward settings and investigate different design choices made to measure the impact on learning efficiency. Suitable Integral Probability Metrics (IPM) are introduced to understand the model’s reward and observation space distribution during training. These design combinations will be evaluated on continuous control tasks with established benchmarks.
- Author: Ravi Akash
- Main Examiner: Prof. Dr. Mathias Niepert
- Supervisors: M.Sc. Carlos E. Luis and Dr. Ing. Felix Berkenkamp (Bosch Center for Artificial Intelligence)
- Submission date: 14.08.2023.