Where: SEQUEL team – INRIA Lille Nord Europe

Superviser: Rémi Munos**Keywords: **

Reinforcement Learning, Random Projections, Compressed Sensing, Least Squares Temporal Difference, Bellman residual minimization.

**Subject:**

We will consider the
problem of reinforcement learning in high-dimensional function
approximation spaces when the number of features is bigger than the
number of samples. In particular, we will study the Least-Squares
Temporal Difference (LSTD) and Bellman Residual Minimization (BRM)
learning algorithms combined with random projections of the data
onto a low-dimensional random feature space. The low dimensional
space is defined as the span of a small number of linear combinations
of all the initial features weighted by random i.i.d. weights. We will
use the (somehow surprising) property that inner-products between
vectors (as well as their norm) are almost preserved through random
projections (this is a concentration-of-measure phenomenon called
the Johnson-Lindenstrauss Lemma [1]), thus performing least-squares
regression in the "compressed domain" (low-dimensional space) is almost
as good as in the initial high dimensional space, but enables to reduce
the estimation error (and the numerical complexity), see [2,3].

Work to so:

After an initial bibliographical work, the work will contain both a theoretical part and a practical implementation of the algorithm for challenging simulated control problems.

Other information:

This Master internship will be funded by SEQUEL Team.

- Possible funding for pursuing with a PhD.
Possibility to submit the work to an international conference in the field of Statistics / Machine Learning

Possibility to attend french or international events (ex: ICML 2012, COLT 2012).

- Possibility to collaborate with several European labs: John Shawe Taylor (UCL), Peter Auer (University Leoben), Jan Peters (Max Planck Institute), Bert Kappen (University of Nijmegen), Chris Watkins (Royal Holloway), Nello Cristianini (University of Bristol), Manfred Opper (Technical University Berlin) through the European STREP project COMPLACS (Composing Learning for Artificial Cognitive Systems).

**References**

[1] S. Dasgupta and A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma, Technical report 99–006, U. C. Berkeley, March 1999. See http://www-cse.ucsd.edu/~dasgupta/papers/jl-tr.ps

[2] O. Maillard and R. Munos.
Compressed least squares regression.
In *Proceedings of Advances in Neural Information Processing
Systems*, 2009. http://hal.archives-ouvertes.fr/inria-00419210/en

[3] O. A. Maillard and R. Munos.
Scrambled objects for least-squares regression.
In *Advances in Neural Information Processing Systems*, 2010. http://hal.archives-ouvertes.fr/inria-00483014/fr/

[4]
M. Ghavamzadeh, A. Lazaric, O. A. Maillard, and R. Munos.
LSTD with random projections.
In *Advances in Neural Information Processing Systems*, 2010. http://chercheurs.lille.inria.fr/~ghavamza/PUBLICATIONS/nips10.pdf