Course on Reinforcement Learning

Abstract

Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:

-Historical multi-disciplinary basis of reinforcement learning

-Markov decision processes and dynamic programming

-Stochastic approximation and Monte-Carlo methods

-Introduction to stochastic and adversarial multi-arm bandit

-Approximate dynamic programming

Where and When

The course on “Reinforcement Learning” will be held at the Ecole Centrale de Lille. The room for lectures is B7-14 and for the practical sessions is C016.

Schedule

Jan 8, 10h15-12h15 - Cours - Room: B7-14 - Intro to RL

Jan 8, 13h30-15h30 - Cours - Room: B7-14 - MDP

Jan 15, 8h-10h - Cours - Room: B7-14 - Dynamic programming

Jan 15, 10h15-12h15 - Cours - Room: B7-14 - Reinforcement Learning

Jan 15, 15h45-17h45 - TNE - Formalize control problems & review some existing approaches

Jan 28, 8h-10h - TP - Room: C016 - Value iteration and policy iteration

Jan 28, 10h15-12h15 - TP - Room: C016 - SARSA and Q-learning

Jan 28, 13h30-15h30 - TNE - Room: B7-14 - Numerical comparison between MC and SARSA

Feb 5, 8h-10h - Cours - Room: B7-14 - Stochastic bandit problem and UCB, linear bandt

Feb 5, 10h15-12h15 - Cours - Room: B7-14 - Adversarial bandit and games

Feb 11, 8h-10h - TP - Room: C016 - UCB and other bandit algorithms

Feb 11, 10h15-12h15 - TP - Room: C016 - Nash equlibria

Feb 12, 8h-10h - Cours - Room: C016 - Extensions of bandit

Feb 12, 13h30-15h30 - TNE - Run additional experiments on bandit

Feb 12, 15h45-17h45 - TNE - Rest

Feb 18, 10h15-12h15 - Cours - Room: C016 - Approximate dynamic programming

Feb 19, 8h-10h - TP - Room: C016 - Approximate dynamic programming

Feb 19, 10h15-12h15 - TP - Room: C016 - Mountain car and inverted pendulum

Feb 19, 13h30-15h30 - TP - Room: C016 - Approximate dynamic programming

Feb 19, 15h45-15h45 - TNE - Review of the applications of ADP

Mar 3, 13h30-15h30 - TNE - Review of the course

Lectures

News

• Link for TP: http://perso.telecom-paristech.fr/~kaufmann/enseignement.html

Proposed papers to review

RULES: Students should work in pairs and prepare a presentation of 15 minutes on two papers (one paper is also acceptable if particularly long) chosen in the following list.