Distributed Computing
ETH Zurich

Seminar in Deep Reinforcement Learning (FS 2019)


When & Where: Tuesdays 10:15 @ ETZ G 91
First seminar: 19.02.2018
Last seminar: 28.05.2018
Coordinators: Roger Wattenhofer & Oliver Richter

As a seminar participant, you are invited to attend all the talks and make a presentation. Your presentation should be in English. The presentation should last 35 minutes plus about 10 minutes of discussion.

Disclaimer: This is a seminar, we will focus on reasearch and skip most of the basics. If you feel like you cannot follow the discussions we invite you to check out this lecture or at least this talk.

Presentation & Discussion

The seminar will be exciting if the presentations are exciting. Here is a 2 page guideline how to do a great scientific presentation. Here are some additional guidelines: 1, 2 and 3.

We further expect the presentation to motivate a lively discussion. We encourage discussions during and after the presentations as a main objective of this seminar. It may help discussions if you also try to be critical about the presented work. These are all scientific papers, but if you have been in science long enough...


Your grade will mostly depend on your presentation. In addition, we also grade how actively you participate in the discussions throughout the whole semester. Further, there will be a programming challenge alongside the seminar, in which you can take part to improve your grade.

Coding Challenge

To try yourself on coding deep reinforcement learning algorithms we implemented a one player version of the game Battleship. The goal of the game is to bomb all of the a-priori invisible battleships of the oponent as fast as possible. Feel free to check out and play around with the implementation which you can find here. Your task is to hand in an agent that is capable of learning the game. You should hand in a link to your documented implementation per mail to Oliver Richter by 11.59pm on the 21.05..

How To Sign Up

There will be two presentation per week, so there is a limited number of slots (topics) which will be assigned on a first-come-first-serve basis. Therefore, we encourage you to contact Oliver Richter and the mentor corresponding to your favorite topic as early as possible to claim your presentation slot. Below you can find the schedule and corresponding suggested papers.

After You Signed Up

We established the following rules to ensure a high quality of the talks and hope that these will result in a good grade for you:


Date Presenter 1 Presenter 2 Title Slides 1 Slides 2
19.02.2019 Oliver Richter Introduction [pdf]
26.02.2019 Adrian Meier Pantelis Vlachas Distributional Deep Reinforcement Learning [pdf] [pdf]
05.03.2019 Mark Arnold Hamza Javed Continuous Control [pdf] [pptx]
12.03.2019 Elena Labzina Yoel Zweig Variance Reduction [pdf] [odp]
19.03.2019 Yang Liu Lucia Liu Overestimation in Q-Learning / Distributed Deep Reinforcement Learning [pdf] [pdf]
26.03.2019 Lioba Heimbach Amray Schwabe Learning from Artificial Demonstrations [pptx] [pptx]
02.04.2019 Georges Pantalos Yilun Wu Exploration-Exploitation Trade-off in Deep Reinforcement Learning [pdf] [pdf]
09.04.2019 HaoChih Lin Yugdeep Bangar Off-Policy Learning [pdf] [pdf]
16.04.2019 Francesco Saverio Varini Mayank Mittal Hierarchical Deep Reinforcement Learning [pdf] [pdf]
30.04.2019 Valentin Anklin Michael Seeber Multi-Agent Deep Reinforcement Learning [pptx] [pptx]
07.05.2019 Nicolas Küchler Multitask/Transfer Deep Reinforcement Learning [pdf]
14.05.2019 Levin Moser Ioannis Mandralis Model Based Deep Reinforcement Learning [pptm] [pptx]
21.05.2019 Chen Jinfan Jan-Nico Zäch Meta Learning / Human Influence [pdf] [pdf]
28.05.2019 Oliver Richter Discussion and Review [pdf]


Title Presenter Mentor
Distributional Deep Reinforcement Learning (Part 1):
A Distributional Perspective on Reinforcement Learning
Marc G. Bellemare, Will Dabney, Rémi Munos. ICML 2017.
Adrian Meier Aryaz Eghbali
Distributional Deep Reinforcement Learning (Part 2):
Distributional Reinforcement Learning with Quantile Regression
Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos. AAAI 2018.
Implicit Quantile Networks for Distributional Reinforcement Learning
Will Dabney, Georg Ostrovski, David Silver, Rémi Munos. ICML 2018.
Pantelis Vlachas Aryaz Eghbali
Continuous Control (Part 1):
Continuous control with deep reinforcement learning
Timothy P. Lillicrap et. al. ICLR 2016.
Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. ICML 2015.
Mark Arnold Oliver Richter
Continuous Control (Part 2):
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov.
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba.
Hamza Javed Oliver Richter
Variance Reduction (Part 1):
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine ICLR 2017.
Action-depedent Control Variates for Policy Optimization via Stein's Identity
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu. ICLR 2018.
Elena Labzina Roland Schmid
Variance Reduction (Part 2):
Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Oron Anschel, Nir Baram, Nahum Shimkin
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Joshua Romoff, Peter Henderson, Alexandre Piché, Vincent Francois-Lavet, Joelle Pineau. CoRL 2018
Yoel Zweig Roland Schmid
Overestimation in Q-Learning:
Deep Reinforcement Learning with Double Q-learning
Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016
Non-delusional Q-learning and value-iteration
Tyler Lu, Dale Schuurmans, Craig Boutilier. NeurIPS 2018
Yang Liu Darya Melnyk
Distributed Deep Reinforcement Learning:
Distributed Prioritized Experience Replay
Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver. ICLR 2018
Distributed Distributional Deterministic Policy Gradients
Gabriel Barth-Maron et. al. ICLR 2018
Lucia Liu Aryaz Eghbali
Learning from Artificial Demonstrations (Part 1):
Playing hard exploration games by watching YouTube
Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas. NeurIPS 2018
Kickstarting Deep Reinforcement Learning
Simon Schmitt et. al.
Lioba Heimbach Gino Brunner
Learning from Artificial Demonstrations (Part 2):
Hindsight Experience Replay
Marcin Andrychowicz et. al. NeurIPS 2017
Learning Self-Imitating Diverse Policies
Tanmay Gangwani, Qiang Liu, Jian Peng. ICLR 2019
Amray Schwabe Gino Brunner
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 1):
Noisy Networks for Exploration
Meire Fortunato et. al. ICLR 2018
Curiosity-driven Exploration by Self-supervised Prediction
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell. ICML 2017
Georges Pantalos Pál András Papp
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 2):
UCB Exploration via Q-Ensembles
Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos. NeurIPS 2016
Yilun Wu Pál András Papp
Off-Policy Learning (Part 1):
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare. NeurIPS 2016
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos. ICLR 2018
HaoChih Lin Yuyi Wang
Off-Policy Learning (Part 2):
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Lasse Espeholt et. al.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. ICML 2018
Yugdeep Bangar Yuyi Wang
Hierarchical Deep Reinforcement Learning (Part 1):
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum
The Option-Critic Architecture
Pierre-Luc Bacon, Jean Harb, Doina Precup. AAAI 2017
Francesco Saverio Varini Manuel Eichelberger
Hierarchical Deep Reinforcement Learning (Part 2):
FeUdal Networks for Hierarchical Reinforcement Learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu
Meta Learning Shared Hierarchies
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
Data-Efficient Hierarchical Reinforcement Learning
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine. NeurIPS 2018
Mayank Mittal Manuel Eichelberger
Multi-Agent Deep Reinforcement Learning (Part 1):
Counterfactual Multi-Agent Policy Gradients
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson. AAAI 2018
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson. ICML 2018
Valentin Anklin Pankaj Khanchandani
Multi-Agent Deep Reinforcement Learning (Part 2):
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
Max Jaderberg et. al.
Evolving intrinsic motivations for altruistic behavior
Jane X. Wang, Edward Hughes, Chrisantha Fernando, Wojciech M. Czarnecki, Edgar A. Duenez-Guzman, Joel Z. Leibo
Michael Seeber Pankaj Khanchandani
Multitask/Transfer Deep Reinforcement Learning (Part 1):
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus. ICLR 2018
Reverse Curriculum Generation for Reinforcement Learning
Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel. CoRL 2017
Nicolas Küchler Damian Pascual
Multitask/Transfer Deep Reinforcement Learning (Part 2):
Progressive Neural Networks
Andrei A. Rusu et. al.
Distral: Robust Multitask Reinforcement Learning
Yee Whye Teh et. al.
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Irina Higgins et. al. ICML 2017
Nicola Storni Damian Pascual
Model Based Deep Reinforcement Learning (Part 1):
World Models
David Ha, Jürgen Schmidhuber. NeurIPS 2018
Temporal Difference Variational Auto-Encoder
Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. ICLR 2019
Levin Moser Simon Tanner
Model Based Deep Reinforcement Learning (Part 2):
Imagination-Augmented Agents for Deep Reinforcement Learning
Théophane Weber et. al. NeurIPS 2017
Successor Features for Transfer in Reinforcement Learning
André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver. NeurIPS 2017
Ioannis Mandralis Simon Tanner
Meta Learning:
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel. ICLR 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn, Pieter Abbeel, Sergey Levine. ICML 2017
Chen Jinfan Zeta Avarikioti
Human Influence:
Deep Q-learning from Demonstrations
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. AAAI 2018
Deep reinforcement learning from human preferences
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Jan-Nico Zäch Tejaswi Nadahalli