Potential topics for Master Theses

The following topics are currently of interest to our lab. Please contact Wendelin to discuss possible master theses. Of course, we can always discuss potential theses on your favorite topic in RL, even if it is not listed here.

Out-of-distribution generalization: While RL agents show very little sign of overfitting in the environments they are trained on, any deviation thereof, for example in multi-task RL or caused by external influences, require the agent to generalize to inputs it has never experienced during training. Normal neural networks are terrible at that, but we have found that some specific neural architectures generalize suprisingly well (Kurin et al., 2021). As the real world constantly changes, out-of-distribution generalization is one of the largest challenges in modern RL (Kirk at al., 2023). Current examples from our lab:
- Weltevrede et al. (2025): How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning.
- Weltevrede et al. (2025): Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs.
- Weltevrede et al. (2023): The Role of Diverse Replay for Generalisation in Reinforcement Learning.
- Serra-Gómez et al. (2023): Active Classification of Moving Targets with Learned Control Policies.
- Eric Vester (thesis, 2021): Reinforcement learning with domain-specific relational inductive biases.
Generalization to unseen topologies in RL: In many domains every episode will have a different structure. For example, in neuroscience every time we get another dish of lab-grown neurons, they are placed and connected differently (Engwegen et al., 2024). Learning another RL policy for each dish would take too much time, so we want to learn one policy for all episodes. Currently we look for a student who wants to apply our methods from neurobiology to robotics (Kurin et al., 2021):
- Engwegen et al. (2025): Modular Recurrence in Contextual MDPs for Universal Morphology Control.
- Engwegen et al. (2024): Generalisation to unseen topologies: Towards control of biological neural network activity.
- Kurin et al. (2021): My Body is a Cage: the Role of Morphology in Graph-based Incompatible Control.
Adaptation of RL policies to unforeseen changes: Generalization only gets you so far, at some point one needs to change the policy. In this case one needs to be able to transfer as many prior experiences as possible to the new situation (e.g. by distillation, Igl et al., 2021) and might also consider to use meta-learning (Finn et al., 2017) to adapt faster with fewer data (Zintgraf, 2019). Current examples from our research lab:
- Ordonez, Tromp, Julbe and Böhmer (2023): Lights out: training RL agents robust to temporary blindness.
Epistemic uncertainty in RL: Generalization and adaptation to new situations requires the agent to know-what-it-knows. We formalize this as epistemic uncertainty (Huellermeier et al., 2020) of the agents and exploit this to detect situations in which the agent does not know what to do, which allows for better exploration (Osband et al., 2018; O'Donoghue et al., 2018; Rashid et al., 2020) and model-based planning (Oren et al., 2023). Current examples from our lab:
- Zanger et al. (2025): Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model.
- Zanger et al. (2025): Universal Value-Function Uncertainties.
- Zanger et al. (2024): Diverse Projection Ensembles for Distributional Reinforcement Learning.
Model-based RL: While in many ways not more powerful than model-free RL, there are certain situations in which model-based RL can solve problems model-free cannot. However, planning with an uncertain model, or an uncertain value function for that matter, poses additional problems as well. Our lab works on multiple ways to improve planning, exploration, execution and generalization of model-based RL algorithms, in particular based on MCTS (Swiechowski et al., 2023), like AlphaZero (Silver et al., 2018.) and MuZero (Schrittwieser et al., 2020). Current examples from our lab:
- Oren et al. (2025): Epistemic Monte Carlo Tree Search.
- Albin Jaldevik (thesis, 2024): General Tree Evaluation for AlphaZero.
Multi-agent learning: We are always interested in building on our past work of coordination in cooperative multi-agent RL (Schroeder de Witt et al., 2019; Boehmer et al., 2020; Iqbal et al., 2021), or to discuss exiting research fields like zero-shot coordination (Bard et al., 2020). Current examples from our lab:
- Erwin Dam (thesis, 2022) Deep maximum Q-learning: combatting relative overgeneralisation in deep independent learners using optimism and similarity.

Finished master theses

2024 - Albin Jaldevik: General Tree Evaluation for AlphaZero
2024 - Felix Kaubek: Investigation into the Effect of Replay Buffer Diversity on Generalizability
2023 - Sander van Leeuwen: Language assistance in reinforcement learning in dynamic environments
2022 - Yaniv Oren: Deep exploration by planning with uncertainty in deep model based reinforcement learning.
2022 - Erwin Dam: Deep maximum Q-learning: combatting relative overgeneralisation in deep independent learners using optimism and similarity.
2021 - Eric Vester: Reinforcement learning with domain-specific relational inductive biases