Master Internship

Reinforcement Learning with Random Projections.

Where: SEQUEL team – INRIA Lille Nord Europe

Superviser: Rémi Munos, Email: remi.munos@inria.fr,

Keywords:
Reinforcement Learning, Random Projections, Compressed Sensing, Least Squares Temporal Difference, Bellman residual minimization.

Subject:
We will consider the problem of reinforcement learning in high-dimensional function approximation spaces when the number of features is bigger than the number of samples. In particular, we will study the Least-Squares Temporal Difference (LSTD) and Bellman Residual Minimization (BRM) learning algorithms combined with random projections of the data onto a low-dimensional random feature space. The low dimensional space is defined as the span of a small number of linear combinations of all the initial features weighted by random i.i.d. weights. We will use the (somehow surprising) property that inner-products between vectors (as well as their norm) are almost preserved through random projections (this is a concentration-of-measure phenomenon called the Johnson-Lindenstrauss Lemma [1]), thus performing least-squares regression in the "compressed domain" (low-dimensional space) is almost as good as in the initial high dimensional space, but enables to reduce the estimation error (and the numerical complexity), see [2,3].

Our goal is to extend the work [4] and design LSTD or BRM learning algorithms that apply in the low-dimensional (random) space and analyze the quality of the resulting value function and policy approximations in terms of the number of available samples (sample complexity) and the capacity of the considered approximation space, as well as to analyze the resulting numerical complexity.

Work to so:
After an initial bibliographical work, the work will contain both a theoretical part and a practical implementation of the algorithm for challenging simulated control problems.

Other information:

This Master internship will be funded by SEQUEL Team.
Possible funding for pursuing with a PhD.
Possibility to submit the work to an international conference in the field of Statistics / Machine Learning
Possibility to attend french or international events (ex: ICML 2012, COLT 2012).
Possibility to collaborate with several European labs: John Shawe Taylor (UCL), Peter Auer (University Leoben), Jan Peters (Max Planck Institute), Bert Kappen (University of Nijmegen), Chris Watkins (Royal Holloway), Nello Cristianini (University of Bristol), Manfred Opper (Technical University Berlin) through the European STREP project COMPLACS (Composing Learning for Artificial Cognitive Systems).

References

[1] S. Dasgupta and A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma, Technical report 99–006, U. C. Berkeley, March 1999. See http://www-cse.ucsd.edu/~dasgupta/papers/jl-tr.ps

[2] O. Maillard and R. Munos. Compressed least squares regression. In Proceedings of Advances in Neural Information Processing Systems, 2009. http://hal.archives-ouvertes.fr/inria-00419210/en

[3] O. A. Maillard and R. Munos. Scrambled objects for least-squares regression. In Advances in Neural Information Processing Systems, 2010. http://hal.archives-ouvertes.fr/inria-00483014/fr/

[4] M. Ghavamzadeh, A. Lazaric, O. A. Maillard, and R. Munos. LSTD with random projections. In Advances in Neural Information Processing Systems, 2010. http://chercheurs.lille.inria.fr/~ghavamza/PUBLICATIONS/nips10.pdf