Where: SEQUEL team – INRIA Lille Nord EuropeSuperviser: Rémi Munos, Email: firstname.lastname@example.org,
Reinforcement Learning, Random Projections, Compressed Sensing, Least Squares Temporal Difference, Bellman residual minimization.
We will consider the problem of reinforcement learning in high-dimensional function approximation spaces when the number of features is bigger than the number of samples. In particular, we will study the Least-Squares Temporal Difference (LSTD) and Bellman Residual Minimization (BRM) learning algorithms combined with random projections of the data onto a low-dimensional random feature space. The low dimensional space is defined as the span of a small number of linear combinations of all the initial features weighted by random i.i.d. weights. We will use the (somehow surprising) property that inner-products between vectors (as well as their norm) are almost preserved through random projections (this is a concentration-of-measure phenomenon called the Johnson-Lindenstrauss Lemma ), thus performing least-squares regression in the "compressed domain" (low-dimensional space) is almost as good as in the initial high dimensional space, but enables to reduce the estimation error (and the numerical complexity), see [2,3].
This Master internship will be funded by SEQUEL Team.
Possibility to submit the work to an international conference in the field of Statistics / Machine Learning
Possibility to attend french or international events (ex: ICML 2012, COLT 2012).
 S. Dasgupta and A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma, Technical report 99–006, U. C. Berkeley, March 1999. See http://www-cse.ucsd.edu/~dasgupta/papers/jl-tr.ps
 O. Maillard and R. Munos. Compressed least squares regression. In Proceedings of Advances in Neural Information Processing Systems, 2009. http://hal.archives-ouvertes.fr/inria-00419210/en
 O. A. Maillard and R. Munos. Scrambled objects for least-squares regression. In Advances in Neural Information Processing Systems, 2010. http://hal.archives-ouvertes.fr/inria-00483014/fr/
 M. Ghavamzadeh, A. Lazaric, O. A. Maillard, and R. Munos. LSTD with random projections. In Advances in Neural Information Processing Systems, 2010. http://chercheurs.lille.inria.fr/~ghavamza/PUBLICATIONS/nips10.pdf