Course on Reinforcement Learning
Course on Reinforcement Learning
Abstract
Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:
-Historical multi-disciplinary basis of reinforcement learning
-Markov decision processes and dynamic programming
-Stochastic approximation and Monte-Carlo methods
-Function approximation and statistical learning theory
-Approximate dynamic programming
-Introduction to stochastic and adversarial multi-arm bandit
-Learning rates and finite-sample analysis
Schedule
• 03/10 -- Markov Decision Processes [Salle Condorcet, d’Alembert]
• 10/10 -- Dynamic Programming [Salle Condorcet, d’Alembert]
• 20/10 -- Reinforcement Learning [Salle Condorcet, d’Alembert]
• 24/10 -- Practical session on Dynamic Programming and Reinforcement Learning [Salle Condorcet, d’Alembert]
• 31/10 -- Exploration-exploitation: Multi-armed Bandit [Salle Condorcet, d’Alembert]
• 9/11 -- Exploration-exploitation: beyond Multi-armed Bandit [Salle Condorcet, d’Alembert]
• 16/11 -- Practical session on Multi-armed Bandit [Salle Condorcet, d’Alembert]
• 21/11 -- Approximate Dynamic Programming [Salle Condorcet, d’Alembert]
• 30/11 -- Policy Search Algorithms and Deep RL [Salle Condorcet, d’Alembert]
• 19/12 -- Practical session on ADP [Salle Condorcet, d’Alembert]
• Around 10/01/2017 -- Deadline for submission proposals
• Around 17/01/2017 -- Presentations
Evaluation
The course will be evaluated according to the points collected in the practical sessions and with a final project. Project proposals, internships, and PhD positions will be announced towards end of October.
New material
News
• Projects page: goo.gl/oVJ9QP
• Report submission deadline: JANUARY 19th at MIDNIGHT
• Presentation day: JANUARY 24th and 25th
• Homework page: http://teopir.github.io/#teaching
• New material on exploration-exploitation and on bandit for learning Nash equilibria in two-player zero-sum games (not covered in the course!)
Links
• Reproducibility challenge: http://www.cs.mcgill.ca/~jpineau/ICLR2018-ReproducibilityChallenge.html
• RL sim: https://www.cs.cmu.edu/~awm/rlsim/
• RL in the real world? https://twitter.com/jackclarkSF/status/919584404472602624