Seminar in Deep Reinforcement Learning (FS 2019)

Organization

When & Where: Tuesdays 10:15 @ ETZ G 91
First seminar: 19.02.2018
Last seminar: 28.05.2018
Coordinators: Roger Wattenhofer & Oliver Richter

As a seminar participant, you are invited to attend all the talks and make a presentation. Your presentation should be in English. The presentation should last 35 minutes plus about 10 minutes of discussion.

Disclaimer: This is a seminar, we will focus on reasearch and skip most of the basics. If you feel like you cannot follow the discussions we invite you to check out this lecture or at least this talk.

Presentation & Discussion

The seminar will be exciting if the presentations are exciting. Here is a 2 page guideline how to do a great scientific presentation. Here are some additional guidelines: 1, 2 and 3.

We further expect the presentation to motivate a lively discussion. We encourage discussions during and after the presentations as a main objective of this seminar. It may help discussions if you also try to be critical about the presented work. These are all scientific papers, but if you have been in science long enough...

Grade

Your grade will mostly depend on your presentation. In addition, we also grade how actively you participate in the discussions throughout the whole semester. Further, there will be a programming challenge alongside the seminar, in which you can take part to improve your grade.

Coding Challenge

To try yourself on coding deep reinforcement learning algorithms we implemented a one player version of the game Battleship. The goal of the game is to bomb all of the a-priori invisible battleships of the oponent as fast as possible. Feel free to check out and play around with the implementation which you can find here. Your task is to hand in an agent that is capable of learning the game. You should hand in a link to your documented implementation per mail to Oliver Richter by 11.59pm on the 21.05..

How To Sign Up

There will be two presentation per week, so there is a limited number of slots (topics) which will be assigned on a first-come-first-serve basis. Therefore, we encourage you to contact Oliver Richter and the mentor corresponding to your favorite topic as early as possible to claim your presentation slot. Below you can find the schedule and corresponding suggested papers.

After You Signed Up

We established the following rules to ensure a high quality of the talks and hope that these will result in a good grade for you:

At least 5 weeks before your talk: first meeting with your mentor (you need to read the assigned literature before this meeting).
At least 3 weeks before your talk: meet your mentor to discuss the structure of your talk.
At least 1 week before your talk: give the talk in front of your mentor who will provide feedback.
At the presentation date we expect an electronic copy of your slides.

Schedule

Date	Presenter 1	Presenter 2	Title	Slides 1	Slides 2
19.02.2019	Oliver Richter		Introduction	[pdf]
26.02.2019	Adrian Meier	Pantelis Vlachas	Distributional Deep Reinforcement Learning	[pdf]	[pdf]
05.03.2019	Mark Arnold	Hamza Javed	Continuous Control	[pdf]	[pptx]
12.03.2019	Elena Labzina	Yoel Zweig	Variance Reduction	[pdf]	[odp]
19.03.2019	Yang Liu	Lucia Liu	Overestimation in Q-Learning / Distributed Deep Reinforcement Learning	[pdf]	[pdf]
26.03.2019	Lioba Heimbach	Amray Schwabe	Learning from Artificial Demonstrations	[pptx]	[pptx]
02.04.2019	Georges Pantalos	Yilun Wu	Exploration-Exploitation Trade-off in Deep Reinforcement Learning	[pdf]	[pdf]
09.04.2019	HaoChih Lin	Yugdeep Bangar	Off-Policy Learning	[pdf]	[pdf]
16.04.2019	Francesco Saverio Varini	Mayank Mittal	Hierarchical Deep Reinforcement Learning	[pdf]	[pdf]
30.04.2019	Valentin Anklin	Michael Seeber	Multi-Agent Deep Reinforcement Learning	[pptx]	[pptx]
07.05.2019	Nicolas Küchler		Multitask/Transfer Deep Reinforcement Learning	[pdf]
14.05.2019	Levin Moser	Ioannis Mandralis	Model Based Deep Reinforcement Learning	[pptm]	[pptx]
21.05.2019	Chen Jinfan	Jan-Nico Zäch	Meta Learning / Human Influence	[pdf]	[pdf]
28.05.2019	Oliver Richter		Discussion and Review	[pdf]

Papers

Title	Presenter	Mentor
Distributional Deep Reinforcement Learning (Part 1): A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos. ICML 2017.	Adrian Meier	Aryaz Eghbali
Distributional Deep Reinforcement Learning (Part 2): Distributional Reinforcement Learning with Quantile Regression Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos. AAAI 2018. Implicit Quantile Networks for Distributional Reinforcement Learning Will Dabney, Georg Ostrovski, David Silver, Rémi Munos. ICML 2018.	Pantelis Vlachas	Aryaz Eghbali
Continuous Control (Part 1): Continuous control with deep reinforcement learning Timothy P. Lillicrap et. al. ICLR 2016. Trust Region Policy Optimization John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. ICML 2015.	Mark Arnold	Oliver Richter
Continuous Control (Part 2): Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba.	Hamza Javed	Oliver Richter
Variance Reduction (Part 1): Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine ICLR 2017. Action-depedent Control Variates for Policy Optimization via Stein's Identity Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu. ICLR 2018.	Elena Labzina	Roland Schmid
Variance Reduction (Part 2): Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning Oron Anschel, Nir Baram, Nahum Shimkin Reward Estimation for Variance Reduction in Deep Reinforcement Learning Joshua Romoff, Peter Henderson, Alexandre Piché, Vincent Francois-Lavet, Joelle Pineau. CoRL 2018	Yoel Zweig	Roland Schmid
Overestimation in Q-Learning: Deep Reinforcement Learning with Double Q-learning Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016 Non-delusional Q-learning and value-iteration Tyler Lu, Dale Schuurmans, Craig Boutilier. NeurIPS 2018	Yang Liu	Darya Melnyk
Distributed Deep Reinforcement Learning: Distributed Prioritized Experience Replay Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver. ICLR 2018 Distributed Distributional Deterministic Policy Gradients Gabriel Barth-Maron et. al. ICLR 2018	Lucia Liu	Aryaz Eghbali
Learning from Artificial Demonstrations (Part 1): Playing hard exploration games by watching YouTube Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas. NeurIPS 2018 Kickstarting Deep Reinforcement Learning Simon Schmitt et. al.	Lioba Heimbach	Gino Brunner
Learning from Artificial Demonstrations (Part 2): Hindsight Experience Replay Marcin Andrychowicz et. al. NeurIPS 2017 Learning Self-Imitating Diverse Policies Tanmay Gangwani, Qiang Liu, Jian Peng. ICLR 2019	Amray Schwabe	Gino Brunner
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 1): Noisy Networks for Exploration Meire Fortunato et. al. ICLR 2018 Curiosity-driven Exploration by Self-supervised Prediction Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell. ICML 2017	Georges Pantalos	Pál András Papp
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 2): UCB Exploration via Q-Ensembles Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos. NeurIPS 2016	Yilun Wu	Pál András Papp
Off-Policy Learning (Part 1): Safe and Efficient Off-Policy Reinforcement Learning Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare. NeurIPS 2016 The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos. ICLR 2018	HaoChih Lin	Yuyi Wang
Off-Policy Learning (Part 2): IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures Lasse Espeholt et. al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. ICML 2018	Yugdeep Bangar	Yuyi Wang
Hierarchical Deep Reinforcement Learning (Part 1): Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum The Option-Critic Architecture Pierre-Luc Bacon, Jean Harb, Doina Precup. AAAI 2017	Francesco Saverio Varini	Manuel Eichelberger
Hierarchical Deep Reinforcement Learning (Part 2): FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu Meta Learning Shared Hierarchies Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman Data-Efficient Hierarchical Reinforcement Learning Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine. NeurIPS 2018	Mayank Mittal	Manuel Eichelberger
Multi-Agent Deep Reinforcement Learning (Part 1): Counterfactual Multi-Agent Policy Gradients Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson. AAAI 2018 QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson. ICML 2018	Valentin Anklin	Pankaj Khanchandani
Multi-Agent Deep Reinforcement Learning (Part 2): Human-level performance in first-person multiplayer games with population-based deep reinforcement learning Max Jaderberg et. al. Evolving intrinsic motivations for altruistic behavior Jane X. Wang, Edward Hughes, Chrisantha Fernando, Wojciech M. Czarnecki, Edgar A. Duenez-Guzman, Joel Z. Leibo	Michael Seeber	Pankaj Khanchandani
Multitask/Transfer Deep Reinforcement Learning (Part 1): Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus. ICLR 2018 Reverse Curriculum Generation for Reinforcement Learning Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel. CoRL 2017	Nicolas Küchler	Damian Pascual
Multitask/Transfer Deep Reinforcement Learning (Part 2): Progressive Neural Networks Andrei A. Rusu et. al. Distral: Robust Multitask Reinforcement Learning Yee Whye Teh et. al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning Irina Higgins et. al. ICML 2017	Nicola Storni	Damian Pascual
Model Based Deep Reinforcement Learning (Part 1): World Models David Ha, Jürgen Schmidhuber. NeurIPS 2018 Temporal Difference Variational Auto-Encoder Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. ICLR 2019	Levin Moser	Simon Tanner
Model Based Deep Reinforcement Learning (Part 2): Imagination-Augmented Agents for Deep Reinforcement Learning Théophane Weber et. al. NeurIPS 2017 Successor Features for Transfer in Reinforcement Learning André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver. NeurIPS 2017	Ioannis Mandralis	Simon Tanner
Meta Learning: RL2: Fast Reinforcement Learning via Slow Reinforcement Learning Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel. ICLR 2017 Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Chelsea Finn, Pieter Abbeel, Sergey Levine. ICML 2017	Chen Jinfan	Zeta Avarikioti
Human Influence: Deep Q-learning from Demonstrations Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. AAAI 2018 Deep reinforcement learning from human preferences Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei	Jan-Nico Zäch	Tejaswi Nadahalli