Seminar in Deep Reinforcement Learning (FS 2019)
Organization
When & Where: Tuesdays 10:15 @ ETZ G 91
First seminar: 19.02.2018
Last seminar: 28.05.2018
Coordinators: Roger Wattenhofer & Oliver Richter
As a seminar participant, you are invited to attend all the talks and make a presentation. Your presentation should be in English. The presentation should last 35 minutes plus about 10 minutes of discussion.
Disclaimer: This is a seminar, we will focus on reasearch and skip most of the basics. If you feel like you cannot follow the discussions we invite you to check out this lecture or at least this talk.
Presentation & Discussion
The seminar will be exciting if the presentations are exciting. Here is a 2 page guideline how to do a great scientific presentation. Here are some additional guidelines: 1, 2 and 3.
We further expect the presentation to motivate a lively discussion. We encourage discussions during and after the presentations as a main objective of this seminar. It may help discussions if you also try to be critical about the presented work. These are all scientific papers, but if you have been in science long enough...
Grade
Your grade will mostly depend on your presentation. In addition, we also grade how actively you participate in the discussions throughout the whole semester. Further, there will be a programming challenge alongside the seminar, in which you can take part to improve your grade.
Coding Challenge
To try yourself on coding deep reinforcement learning algorithms we implemented a one player version of the game Battleship. The goal of the game is to bomb all of the a-priori invisible battleships of the oponent as fast as possible. Feel free to check out and play around with the implementation which you can find here. Your task is to hand in an agent that is capable of learning the game. You should hand in a link to your documented implementation per mail to Oliver Richter by 11.59pm on the 21.05..
How To Sign Up
There will be two presentation per week, so there is a limited number of slots (topics) which will be assigned on a first-come-first-serve basis. Therefore, we encourage you to contact Oliver Richter and the mentor corresponding to your favorite topic as early as possible to claim your presentation slot. Below you can find the schedule and corresponding suggested papers.
After You Signed Up
We established the following rules to ensure a high quality of the talks and hope that these will result in a good grade for you:- At least 5 weeks before your talk: first meeting with your mentor (you need to read the assigned literature before this meeting).
- At least 3 weeks before your talk: meet your mentor to discuss the structure of your talk.
- At least 1 week before your talk: give the talk in front of your mentor who will provide feedback.
- At the presentation date we expect an electronic copy of your slides.
Schedule
Date | Presenter 1 | Presenter 2 | Title | Slides 1 | Slides 2 |
---|---|---|---|---|---|
19.02.2019 | Oliver Richter | Introduction | [pdf] | ||
26.02.2019 | Adrian Meier | Pantelis Vlachas | Distributional Deep Reinforcement Learning | [pdf] | [pdf] |
05.03.2019 | Mark Arnold | Hamza Javed | Continuous Control | [pdf] | [pptx] |
12.03.2019 | Elena Labzina | Yoel Zweig | Variance Reduction | [pdf] | [odp] |
19.03.2019 | Yang Liu | Lucia Liu | Overestimation in Q-Learning / Distributed Deep Reinforcement Learning | [pdf] | [pdf] |
26.03.2019 | Lioba Heimbach | Amray Schwabe | Learning from Artificial Demonstrations | [pptx] | [pptx] |
02.04.2019 | Georges Pantalos | Yilun Wu | Exploration-Exploitation Trade-off in Deep Reinforcement Learning | [pdf] | [pdf] |
09.04.2019 | HaoChih Lin | Yugdeep Bangar | Off-Policy Learning | [pdf] | [pdf] |
16.04.2019 | Francesco Saverio Varini | Mayank Mittal | Hierarchical Deep Reinforcement Learning | [pdf] | [pdf] |
30.04.2019 | Valentin Anklin | Michael Seeber | Multi-Agent Deep Reinforcement Learning | [pptx] | [pptx] |
07.05.2019 | Nicolas Küchler | Multitask/Transfer Deep Reinforcement Learning | [pdf] | ||
14.05.2019 | Levin Moser | Ioannis Mandralis | Model Based Deep Reinforcement Learning | [pptm] | [pptx] |
21.05.2019 | Chen Jinfan | Jan-Nico Zäch | Meta Learning / Human Influence | [pdf] | [pdf] |
28.05.2019 | Oliver Richter | Discussion and Review | [pdf] |
Papers
Title | Presenter | Mentor |
---|---|---|
Distributional Deep Reinforcement Learning (Part 1): A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos. ICML 2017. |
Adrian Meier | Aryaz Eghbali |
Distributional Deep Reinforcement Learning (Part 2): Distributional Reinforcement Learning with Quantile Regression Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos. AAAI 2018. Implicit Quantile Networks for Distributional Reinforcement Learning Will Dabney, Georg Ostrovski, David Silver, Rémi Munos. ICML 2018. |
Pantelis Vlachas | Aryaz Eghbali |
Continuous Control (Part 1): Continuous control with deep reinforcement learning Timothy P. Lillicrap et. al. ICLR 2016. Trust Region Policy Optimization John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. ICML 2015. |
Mark Arnold | Oliver Richter |
Continuous Control (Part 2): Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba. |
Hamza Javed | Oliver Richter |
Variance Reduction (Part 1): Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine ICLR 2017. Action-depedent Control Variates for Policy Optimization via Stein's Identity Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu. ICLR 2018. |
Elena Labzina | Roland Schmid |
Variance Reduction (Part 2): Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning Oron Anschel, Nir Baram, Nahum Shimkin Reward Estimation for Variance Reduction in Deep Reinforcement Learning Joshua Romoff, Peter Henderson, Alexandre Piché, Vincent Francois-Lavet, Joelle Pineau. CoRL 2018 |
Yoel Zweig | Roland Schmid |
Overestimation in Q-Learning: Deep Reinforcement Learning with Double Q-learning Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016 Non-delusional Q-learning and value-iteration Tyler Lu, Dale Schuurmans, Craig Boutilier. NeurIPS 2018 |
Yang Liu | Darya Melnyk |
Distributed Deep Reinforcement Learning: Distributed Prioritized Experience Replay Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver. ICLR 2018 Distributed Distributional Deterministic Policy Gradients Gabriel Barth-Maron et. al. ICLR 2018 |
Lucia Liu | Aryaz Eghbali |
Learning from Artificial Demonstrations (Part 1): Playing hard exploration games by watching YouTube Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas. NeurIPS 2018 Kickstarting Deep Reinforcement Learning Simon Schmitt et. al. |
Lioba Heimbach | Gino Brunner |
Learning from Artificial Demonstrations (Part 2): Hindsight Experience Replay Marcin Andrychowicz et. al. NeurIPS 2017 Learning Self-Imitating Diverse Policies Tanmay Gangwani, Qiang Liu, Jian Peng. ICLR 2019 |
Amray Schwabe | Gino Brunner |
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 1): Noisy Networks for Exploration Meire Fortunato et. al. ICLR 2018 Curiosity-driven Exploration by Self-supervised Prediction Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell. ICML 2017 |
Georges Pantalos | Pál András Papp |
Exploration-Exploitation Trade-off in Deep Reinforcement Learning (Part 2): UCB Exploration via Q-Ensembles Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos. NeurIPS 2016 |
Yilun Wu | Pál András Papp |
Off-Policy Learning (Part 1): Safe and Efficient Off-Policy Reinforcement Learning Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare. NeurIPS 2016 The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos. ICLR 2018 |
HaoChih Lin | Yuyi Wang |
Off-Policy Learning (Part 2): IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures Lasse Espeholt et. al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. ICML 2018 |
Yugdeep Bangar | Yuyi Wang |
Hierarchical Deep Reinforcement Learning (Part 1): Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum The Option-Critic Architecture Pierre-Luc Bacon, Jean Harb, Doina Precup. AAAI 2017 |
Francesco Saverio Varini | Manuel Eichelberger |
Hierarchical Deep Reinforcement Learning (Part 2): FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu Meta Learning Shared Hierarchies Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman Data-Efficient Hierarchical Reinforcement Learning Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine. NeurIPS 2018 |
Mayank Mittal | Manuel Eichelberger |
Multi-Agent Deep Reinforcement Learning (Part 1): Counterfactual Multi-Agent Policy Gradients Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson. AAAI 2018 QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson. ICML 2018 |
Valentin Anklin | Pankaj Khanchandani |
Multi-Agent Deep Reinforcement Learning (Part 2): Human-level performance in first-person multiplayer games with population-based deep reinforcement learning Max Jaderberg et. al. Evolving intrinsic motivations for altruistic behavior Jane X. Wang, Edward Hughes, Chrisantha Fernando, Wojciech M. Czarnecki, Edgar A. Duenez-Guzman, Joel Z. Leibo |
Michael Seeber | Pankaj Khanchandani |
Multitask/Transfer Deep Reinforcement Learning (Part 1): Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus. ICLR 2018 Reverse Curriculum Generation for Reinforcement Learning Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel. CoRL 2017 |
Nicolas Küchler | Damian Pascual |
Multitask/Transfer Deep Reinforcement Learning (Part 2): Progressive Neural Networks Andrei A. Rusu et. al. Distral: Robust Multitask Reinforcement Learning Yee Whye Teh et. al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning Irina Higgins et. al. ICML 2017 |
Nicola Storni | Damian Pascual |
Model Based Deep Reinforcement Learning (Part 1): World Models David Ha, Jürgen Schmidhuber. NeurIPS 2018 Temporal Difference Variational Auto-Encoder Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. ICLR 2019 |
Levin Moser | Simon Tanner |
Model Based Deep Reinforcement Learning (Part 2): Imagination-Augmented Agents for Deep Reinforcement Learning Théophane Weber et. al. NeurIPS 2017 Successor Features for Transfer in Reinforcement Learning André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver. NeurIPS 2017 |
Ioannis Mandralis | Simon Tanner |
Meta Learning: RL2: Fast Reinforcement Learning via Slow Reinforcement Learning Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel. ICLR 2017 Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Chelsea Finn, Pieter Abbeel, Sergey Levine. ICML 2017 |
Chen Jinfan | Zeta Avarikioti |
Human Influence: Deep Q-learning from Demonstrations Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. AAAI 2018 Deep reinforcement learning from human preferences Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei |
Jan-Nico Zäch | Tejaswi Nadahalli |