Potential topics for Master Theses
The following topics are currently of interest to our lab.
Please contact Wendelin
to discuss possible master theses.
Of course, we can always discuss potential theses
on your favorite topic in RL,
even if it is not listed here.
- Out-of-distribution generalization:
While RL agents show very little sign of overfitting in
the environments they are trained on,
any deviation thereof, for example in multi-task RL
or caused by external influences,
require the agent to generalize to
inputs it has never experienced during training.
Normal neural networks are terrible at that,
but we have found that some specific neural
architectures generalize suprisingly well
(Kurin
et al., 2021).
As the real world constantly changes,
out-of-distribution generalization
is one of the largest challenges in modern RL
(Kirk at al., 2023).
Current examples from our lab:
- Generalization to unseen topologies in RL:
In many domains every episode will have a different structure.
For example, in neuroscience every time we get another dish
of lab-grown neurons, they are placed and connected differently
(Engwegen et al., 2024).
Learning another RL policy for each dish would take too much
time, so we want to learn one policy for all episodes.
Currently we look for a student who wants to apply
our methods from neurobiology to robotics
(Kurin et al., 2021):
- Adaptation of RL policies to unforeseen changes:
Generalization only gets you so far,
at some point one needs to change the policy.
In this case one needs to be able to transfer
as many prior experiences as possible
to the new situation
(e.g. by distillation,
Igl
et al., 2021)
and might also consider to use meta-learning
(Finn et al., 2017)
to adapt faster with fewer data
(Zintgraf, 2019).
Current examples from our research lab:
- Epistemic uncertainty in RL:
Generalization and adaptation to new situations
requires the agent to know-what-it-knows.
We formalize this as epistemic uncertainty
(Huellermeier
et al., 2020)
of the agents and exploit this to detect
situations in which the agent does not
know what to do, which allows for better exploration
(Osband et al., 2018;
O'Donoghue et al., 2018;
Rashid et al., 2020)
and model-based planning
(Oren et al., 2023).
Current examples from our lab:
- Planning-through-backpropagation:
Recently the optimization of action sequences
by gradient descent through a differntiable model
(Wu et al., 2017;
Wu et al., 2020)
has attracted the attention of both science and
industry. For example, NVIDIA developed a differntiable robotic simulator
that runs on the GPU in order to optimize action sequences
of robots in real time with backpropagation
(Xu
et al., 2022).
However, they also show that long planning horizons lead
to many local minima, which prevent optimization in practice.
We are interested in methods that allow us to smoothen
the optimization landscape without loosing too much
performance of the chosen actions.
Current examples from our lab:
- Multi-agent learning:
We are always interested in building on our past work of
coordination in cooperative multi-agent RL
(Schroeder
de Witt et al., 2019;
Boehmer et al., 2020;
Iqbal et al., 2021),
or to discuss exiting research fields like
zero-shot coordination (Bard
et al., 2020).
Current examples from our lab:
Finished master theses