Course on Reinforcement Learning
Course on Reinforcement Learning
Rules for the homework
• You can work on it in pairs.
• The deadline is strict. Any delay within 6 hours will receive a penalty of -1 points. Within 24 hours, the penalty is -2 and then -5. The penalties are intended over 20.
• Each homework assigns 3.5 points.
• The submission should by done by email with “[EC]” in the subject.
• The submission should be the code and a detailed report (that can be generated automatically from Matlab comments).
• Both the correctness of the code and the quality of the report will be taken into consideration in the evaluation.
Rules for the homework
• Text of the first Homework: homework1-tree.pdf
Proposed papers to review
=== Application to computer games ===
Extended form games (poker)
Other games
•Mastering the game of Go with deep neural networks and tree search
•Human-level control through deep reinforcement learning (and the debate here)
•Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
=== Advertising and recommendation ===
•A Contextual-Bandit Approach to Personalized News Article Recommendation
•Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees
•Efficient Thompson Sampling for Online Matrix-Factorization Recommendation
•A Multiple-Play Bandit Algorithm Applied to Recommender Systems
=== Education ===
•Offline Policy Evaluation Across Representations with Applications to Educational Games
•Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits
•Multi-Armed Bandit Problem and Its Applications in Intelligent Tutoring Systems [This is just a master thesis]
=== Finance ===
•John Moody and Matthew Saffell. Learning to trade via direct reinforcement, 2001
•"Censored Exploration and the Dark Pool Problem"
•Beomsoo Park and Benjamin Van Roy. Adaptive execution: Exploration and learning of price impact
=== Other applications (some of them are quite old) ===
•Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
•Reinforcement Learning for Electric Power System Decision and Control
•An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application
•Reinforcement Learning-based Control of Traffic Lights in Non-stationary Environments
•RL-MAC: a reinforcement learning based MAC protocol for wireless sensor networks
•Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
•“Autonomous inverted helicopter flight via reinforcement learning”
•“An Intelligent Battery Controller Using Bias-Corrected Q-learning”
•Ying Tan, Wei Liu, and Qinru Qiu. Adaptive power management using reinforcement learning
=== Other topics in RL ===
•Deep Reinforcement Learning: an Overview (it also has a lot of pointers to applications)
•Inverse reinforcement learning [1]
Rules for the presentations
• Choose two papers (unless you select one very long)
• The review can be done in pairs
• Presentations with slides are of 25 min (MAX) and they should be balanced between the two people
• Register your papers and presentation slot on https://goo.gl/yk5nFv
Approximate dynamic programming with addendum