PhD students in Theoretical Reinforcement Learning

	Moritz Zanger (co-supervised by Matthijs Spaan) Uncertainty estimation for exploration. Moritz investigates how different ways to model the return distributions in distributional RL (e.g. Bellemare et al., 2017; Dabney et al., 2018) affect the epistemic uncertainty predicted by an ensemble, and how different generalization properties can be exploited for better out-of-distribution detection and exploration.
	Max Weltevrede (co-supervised by Matthijs Spaan) Adaptation to unforeseen environmental changes. Max is currently looking into fast adaptation of previously optimal policies in response to unforeseen changes in the environment, e.g., caused by the introduction or removal of obstacles in a robot's path. We want to develop methods that can not only reuse as much prior knowledge as possible, but also acquire such knowledge with the goal of adaptation in mind.
	Laurens Engwegen (co-supervised by Daan Brinks) Generalization to combinatorial visual domains. In a collaboration with neuroscientists of the BIOlab, Laurens investigates how neurons in a dish can be driven by a RL controller. The major challenge is that each dish with neurons, while using the same underlying dynamics, contains different neurons in a new spatial configuration and with new dendrite connections. We want to develop techniques that can generalize learned control policies between dishes by using novel combinations of attention and LSTM layers.
	Yaniv Oren (co-supervised by Matthijs Spaan) Uncertainty propagation in model-based RL. Yaniv looks into how epistemic uncertainty about predictions of learned models propagates in decision-trees used in model-based RL. In Oren et al. (submitted) we extend Deepmind's MuZero algorithm (Schrittwieser et al., 2020) and show improved exploration, but we want to use the same methodology to enable reliable planning and to free MuZero from the shackles of on-policy planning.
	Caroline Horsch (co-supervised by Matthijs Spaan) Reliable Out-of-Distribution Generalization in Deep Reinforcement Learning. Caroline works on a NWO M1 grant, where we want to use uncertainty estimation in graph and attention neural networks to adapt the graph/attention topology in out-of-distribution situations of multi-task RL, to answer the question "what do you do when you do not know what to do"?

PhD students in Applied Reinforcement Learning

	Álvaro Serra-Gómez (co-supervised by Javier Alonso-Mora) Coordinated drone surveillance of moving targets. In the autonomous multi-robots lab, Alvaro investigates how to learn RL policies that use classical robotics control (MPC) as action primitives (Serra-Gomez et al., submitted). We aim for neural architectures that allow to generalize to unseen behavior and scale both in the number of coordinated drones and in the number of considered objects/people.
	Grigorii Veviurko (co-supervised by Mathijs de Weerdt) Differentiation through linear and quadratic programs. Girgorii investigates the use of RL in DC power-control (Veviurko et al., 2022) together with our partners at DC Opportunities. While deep RL cannot exploit the knowledge we have about power systems, we found a formulation of the problem where we use linear (LP) and quadratic programs (QP) as layers of a neural network policy (differentiable optimization, Agrawal et al., 2019), which is trained with RL (predict and optimize, Emachtoub and Grigas, 2020). Our goal is to develop sequential planning algorithms, where the objective is learned from real-world data by passing gradients through the planner itself. Grigorii is funded by the Fexible Meshed DC Grid project.
	Ksenija Stepanovic (co-supervised by Mathijs de Weerdt) Ksenija investigates how to address gaps of expert-derived controllers of industrial systems by developing data-driven controllers at the example of district heating system (Stepanovic et al., 2022). Together with our project partners in Flex Technologies, methetnet B.V. and Vattenfall, we train neural networks to model highly non-linear relationships in the control of heat-networks. The advantages are that the learned models are differentiable and one can use the planning-through-backpropagation framework (Wu et al., 2017; Wu et al., 2020; Xu et al., 2022) to find an optimal action-sequence. We aim for models that have unique solutions and gradient-based optimization that can still adhere to constraints. Ksenija is funded by the Flex met Warmte project.
	Saray Bakker (co-supervised by Javier Alonso-Mora) Intuition for multi-robot motion planning in the presence of humans. In the autonomous multi-robots lab, Saray investigates how to learn "intuition" for robots that have to interact with independent agents like humans or other robots. We are looking into various techniques, from gradient propagation through MPC objectives (e.g. methods developed by Grigorii), over geometric optimization (Ratliff et al., 2020), to sample-based control (Abraham et al., 2020). We aim for a robotic motion control algorithm that avoid getting the way of other agents by recognizing their intentions and learn how to react to them.