Course on Reinforcement Learning

 

Abstract


Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:


-Historical multi-disciplinary basis of reinforcement learning

-Markov decision processes and dynamic programming

-Stochastic approximation and Monte-Carlo methods

-Introduction to stochastic and adversarial multi-arm bandit

-Approximate dynamic programming


Where and When


The course on “Reinforcement Learning” will be held at the Ecole Centrale de Lille. The room for lectures is B7-14 and for the practical sessions is C016.


Schedule


Jan 8, 10h15-12h15 - Cours - Room: B7-14 - Intro to RL

Jan 8, 13h30-15h30 - Cours - Room: B7-14 - MDP


Jan 15, 8h-10h - Cours - Room: B7-14 - Dynamic programming

Jan 15, 10h15-12h15 - Cours - Room: B7-14 - Reinforcement Learning

Jan 15, 15h45-17h45 - TNE - Formalize control problems & review some existing approaches


Jan 28, 8h-10h - TP - Room: C016 - Value iteration and policy iteration

Jan 28, 10h15-12h15 - TP - Room: C016 - SARSA and Q-learning

Jan 28, 13h30-15h30 - TNE - Room: B7-14 - Numerical comparison between MC and SARSA


Feb 5, 8h-10h - Cours - Room: B7-14 - Stochastic bandit problem and UCB, linear bandt

Feb 5, 10h15-12h15 - Cours - Room: B7-14 - Adversarial bandit and games


Feb 11, 8h-10h - TP - Room: C016 - UCB and other bandit algorithms

Feb 11, 10h15-12h15 - TP - Room: C016 - Nash equlibria


Feb 12, 8h-10h - Cours - Room: C016 - Extensions of bandit

Feb 12, 13h30-15h30 - TNE - Run additional experiments on bandit

Feb 12, 15h45-17h45 - TNE - Rest


Feb 18, 10h15-12h15 - Cours - Room: C016 - Approximate dynamic programming


Feb 19, 8h-10h - TP - Room: C016 - Approximate dynamic programming

Feb 19, 10h15-12h15 - TP - Room: C016 - Mountain car and inverted pendulum

Feb 19, 13h30-15h30 - TP - Room: C016 - Approximate dynamic programming

Feb 19, 15h45-15h45 - TNE - Review of the applications of ADP


Mar 3, 13h30-15h30 - TNE - Review of the course

Lectures

Proposed papers to review


RULES: Students should work in pairs and prepare a presentation of 15 minutes on two papers (one paper is also acceptable if particularly long) chosen in the following list.


  1. Google study on multi-arm bandit for Google Analytics

  2. “Adaptive Stochastic Control for Smart Grids”

  3. “An Intelligent Battery Controller Using Bias-Corrected Q-learning”

  4. “An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application”

  5. “A Contextual-Bandit Approach to Personalized News Article Recommendation”

  6. “Reinforcement Learning in Robotics: A Survey”

  7. “Autonomous inverted helicopter flight via reinforcement learning”

  8. “Reinforcement Learning for Optimized Trade Execution”

  9. “Reinforcement Learning-based Control of Traffic Lights in Non-stationary Environments”

  10. “Optimizing Dialogue Management with Reinforcement Learning”

  11. “Coadaptive Brain–Machine Interface via Reinforcement Learning”

  12. “RL-MAC: a reinforcement learning based MAC protocol for wireless sensor networks”

  13. “Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems”

  14. “Interactive Selection of Visual Features through Reinforcement Learning”

  15. “Least-Squares Policy Iteration”

  16. “Regret Minimization in Games with Incomplete Information”

  17. “Approximate Dynamic Programming Finally Performs Well in the Game of Tetris”

  18. “Playing Atari with Deep Reinforcement Learning”

  19. “Reinforcement Learning for Elevator Control”