PhD Defence: “Large-scale Automatic Learning of Autonomous Agent Behavior with Structured Deep Reinforcement Learning”, Edward Beeching, amphi, Chappe Building, 3rd of May 2022 at 10:00 AM


The defense willtake place on Tuesday, May 3 at 10:00 am, in the amphitheatre of the Telecommunications Department (Claude Chappe building), INSA Lyon, Villeurbanne.

The presentation will be available on Youtube at the following link:



Large-scale Automatic Learning of Autonomous Agent Behavior with Structured Deep Reinforcement Learning




Autonomous robotic agents have begun to impact many aspects of our society,with application in automated logistics, autonomous hospital porters, manufacturing and household assistants. The objective of this thesis is to explore Deep Reinforcement Learning approaches to planning and navigation in large and unknown 3D environments. In particular, we focus on tasks that require exploration and memory in simulated environments. An additional requirement is that learned policies should generalize to unseen map instances. Our long-term objective is the transfer of a learned policy to a real-world robotic system. Reinforcement learning algorithms learn by interaction. By acting with the objective of accumulating a task-based reward, an Embodied AI agent must learn to discover relevant semantic cues such as object recognition and obstacle avoidance, if these skills are pertinent to the task at hand. This thesis introduces the field of Structured Deep Reinforcement Learning and then describes 5 contributions that were published during the PhD.

We start by creating a set of challenging memory-based tasks whose performance is benchmarked with an unstructured memory-based agent. We then demonstrate how the incorporation of structure in the form of a learned metric map, differentiable inverse projective geometry and self-attention mechanisms; augments the unstructured agent, improving its performance and allowing us to interpret the agent’s reasoning process.

We then move from complex tasks in visually simple environments, to more challenging environments with photo-realistic observations, extracted from scans of real-world buildings. In this work we demonstrate that augmenting such an agent with a topological map can improve its navigation performance. We achieve this by learning a neural approximation of a classical path planning algorithm, which can be utilized on graphs with uncertain connectivity.

From work undertaken over the course of a 4-month internship at the research and development department of Ubisoft, we demonstrate that structured methods can also be used for navigation and planning in challenging video game environments. Where we couple a lower level neural policy with a classical planning algorithm to improve long-distance planning and navigation performance in vast environments of 1km×1km. We release an open-source version of the environment as a benchmark for navigation in large-scale environments.

Finally, we develop an open-source Deep Reinforcement Learning interface for the Godot Game Engine. Allowing for the construction of complex virtual worlds and the learning of agent behaviors with a suite of state-of-the-art algorithms. We release the tool with a permissive open-source (MIT) license, to aid researchers in their pursuit of complex embodied AI agents.




    • Mme. Elisa Fromont – Université de Rennes 1 – Rapporteur
    • M. David Filliat – ENSTA Paris – Rapporteur
    • M. Cédric Démonceaux – Université de Bourgogne – Examinateur
    • M. Karteek Alahari – INRIA Grenoble – Examinateur
    • Mme. Christine Solnon – INSA-Lyon – Examinateur
    • M. Olivier Simonin – INSA-Lyon – Directeur de thèse
    • M. Jilles Dibangoye – INSA-Lyon – Co-encadrant de thèse
    • M. Christian Wolf – Naver Labs Europe – Co-directeur de thèse