Publications Remi Munos

[1] R. Munos. From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning. To appear in Foundations and Trends in Machine Learning, pages 1-130, 2013. [ bib | http | Abstract ]
[2] Ronald Ortner, Daniil Ryabko, Peter Auer, and Rémi Munos. Regret bounds for restless markov bandits. To appear in Theoretical Computer Science, 2013. [ bib | .pdf | Abstract ]
[3] A. Carpentier, R. Munos, and A. Antos. Minimax strategy for stratified sampling for Monte Carlo. To appear in Journal of Machine Learning Research, 2013. [ bib | .pdf | Abstract ]
[4] G. Kedenburg, R. Fonteneau, and R. Munos. Aggregating optimistic planning trees for solving markov decision processes. In Advances in Neural Information Processing Systems, 2013. [ bib ]
[5] N. Korda, E. Kaufmann, and R. Munos. Thompson sampling for one-dimensional exponential family bandits. In Advances in Neural Information Processing Systems, 2013. [ bib | http | .pdf | Abstract ]
[6] Alexandra Carpentier and Rémi Munos. Toward optimal stratification for stratified monte-carlo integration. In International Conference on Machine Learning, 2013. [ bib | .pdf | Abstract ]
[7] Michal Valko, Alexandra Carpentier, and Rémi Munos. Stochastic simultaneous optimistic optimization. In International Conference on Machine Learning, 2013. [ bib | demo | poster | slides | code | .pdf | Abstract ]
[8] Michal Valko, Nathaniel Korda, Rémi Munos, Ilias Flaounas, and Nello Cristianini. Finite-time analysis of kernelised contextual bandits. In Conference on Uncertainty in Artificial Intelligence, 2013. [ bib | poster | code | .pdf | Abstract ]
[9] R. Fonteneau, L. Busoniu, and R. Munos. Optimistic planning for belief-augmented markov decision processes. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2013. [ bib | .pdf | Abstract ]
[10] Lucian Busoniu, Alexander Daniel, Remi Munos, and Robert Babuska. Optimistic planning for continuous-action deterministic systems. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2013. [ bib | .pdf | Abstract ]
[11] O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz. Kullback-Leibler upper confidence bounds for optimal sequential allocation. Annals of Statistics, 41(3):1516-1541, 2013. [ bib | .pdf | .pdf | Abstract ]
[12] M. G. Azar, R. Munos, and H.J. Kappen. Minimax PAC-bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning Journal, 91(3):325-349, 2013. [ bib | .pdf | Abstract ]
[13] J. Fruitet, A. Carpentier, R. Munos, and M. Clerc. Automatic motor task selection via a bandit algorithm for a brain-controlled button. Journal of Neural Engineering, 10(1), 2013. [ bib | http | .pdf | Abstract ]
[14] O. Maillard and R. Munos. Linear regression with random projections. Journal of Machine learning Research, 13:2735-2772, 2012. [ bib | .pdf | Abstract ]
[15] J. Fruitet, A. Carpentier, R. Munos, and M. Clerc. Bandit algorithms boost motor-task selection for brain computer interfaces. In Advances in Neural Information Processing Systems, 2012. [ bib | Abstract ]
[16] A. Carpentier and R. Munos. Adaptive stratified sampling for monte-carlo integration of differentiable functions. In Advances in Neural Information Processing Systems, 2012. [ bib | Abstract ]
[17] A. Sani, A. Lazaric, and R. Munos. Risk-aversion in multi-armed bandits. In Advances in Neural Information Processing Systems, 2012. [ bib | .pdf | Abstract ]
[18] L. Busoniu and R. Munos. Optimistic planning for markov decision processes. In International conference on Artificial Intelligence and Statistics, 2012. [ bib | .pdf | Abstract ]
[19] A. Carpentier and R. Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In International conference on Artificial Intelligence and Statistics, 2012. [ bib | .pdf | Abstract ]
[20] E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite time analysis. In International Conference on Algorithmic Learning Theory, 2012. [ bib | .pdf | Abstract ]
[21] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless markov bandits. In International Conference on Algorithmic Learning Theory, 2012. [ bib | .pdf | Abstract ]
[22] A. Carpentier and R. Munos. Minimax number of strata for online stratified sampling given noisy samples. In International Conference on Algorithmic Learning Theory, 2012. [ bib | .pdf | Abstract ]
[23] M. G. Azar, R. Munos, and H.J. Kappen. On the sample complexity of reinforcement learning with a generative model. In International Conference on Machine Learning, 2012. [ bib | .pdf | Abstract ]
[24] L. Busoniu, R. Munos, and R. Babuska. A review of optimistic planning in Markov decision processes. In Frank Lewis and Derong Liu, editors, Reinforcement Learning and Adaptive Dynamic Programming for feedback control, pages 494-516. Wiley, 2012. [ bib | .pdf | Abstract ]
[25] A. Carpentier and R. Munos. Finite time analysis of stratified sampling for monte carlo. In Advances in Neural Information Processing Systems, 2011. [ bib | .pdf | Abstract ]
[26] M. G. Azar, R. Munos, M. Ghavamzadeh, and B. Kapper. Speedy Q-learning. In Advances in Neural Information Processing Systems, 2011. [ bib | .pdf | Abstract ]
[27] A. Carpentier, O. A. Maillard, and R. Munos. Sparse recovery with brownian sensing. In Advances in Neural Information Processing Systems, 2011. [ bib | .pdf | Abstract ]
[28] R. Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in Neural Information Processing Systems, 2011. [ bib | .pdf | Abstract ]
[29] O. A. Maillard, R. Munos, and D. Ryabko. Selecting the state-representation in reinforcement learning. In Advances in Neural Information Processing Systems, 2011. [ bib | .pdf | Abstract ]
[30] A. Lazaric, M. Ghavamzadeh, and R. Munos. Finite-sample analysis of least-squares policy iteration. Journal of Machine learning Research, 13:3041-3074, 2011. [ bib | .pdf | Abstract ]
[31] Odalric-Ambrym Maillard, Rémi Munos, and Gilles Stoltz. Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences. In Conference On Learning Theory, 2011. [ bib | http | .pdf | Abstract ]
[32] M. Ghavamzadeh, A. Lazaric, R. Munos, and M. Hoffman. Finite-sample analysis of Lasso-TD. In International Conference on Machine Learning, 2011. [ bib | .pdf | Abstract ]
[33] Alexandra Carpentier, Mohammad Ghavamzadeh, Alessandro Lazaric, Rémi Munos, and Peter Auer. Upper confidence bounds algorithms for active learning in multi-armed bandits. In Algorithmic Learning Theory, 2011. [ bib | http | .pdf | Abstract ]
[34] Odalric-Ambrym Maillard and Rémi Munos. Adaptive bandits: Towards the best history-dependent strategy. In International conference on Artificial Intelligence and Statistics, 2011. [ bib | .pdf | Abstract ]
[35] L. Busoniu, R. Munos, B. De Schutter, and R. Babuska. Optimistic planning for sparsely stochastic systems. In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2011. [ bib | .pdf | Abstract ]
[36] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 412:1832-1852, 2011. [ bib | http | .pdf | Abstract ]
[37] S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvári. X-armed bandits. Journal of Machine Learning Research, 12:1655-1695, 2011. [ bib | http | .pdf | Abstract ]
[38] A. Lazaric and R. Munos. Learning with stochastic inputs and adversarial outputs. To appear in Journal of Computer and System Sciences, 2011. [ bib | .pdf | Abstract ]
[39] L. Busoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuska, and B. De Schutter. Least-squares methods for policy iteration. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art, number 12, pages 75 - 109. Springer, 2011. [ bib | .pdf | Abstract ]
[40] J.Y. Audibert, S. Bubeck, and R. Munos. Bandit view on noisy optimization. In Optimization for Machine Learning. MIT Press, 2010. [ bib | .pdf | Abstract ]
[41] O. A. Maillard and R. Munos. Scrambled objects for least-squares regression. In Advances in Neural Information Processing Systems, 2010. [ bib | .pdf | Abstract ]
[42] M. Ghavamzadeh, A. Lazaric, O. A. Maillard, and R. Munos. LSTD with random projections. In Advances in Neural Information Processing Systems, 2010. [ bib | .pdf | Abstract ]
[43] A. M. Farahmand, R. Munos, and Cs. Szepesvári. Error propagation for approximate policy and value iteration. In Advances in Neural Information Processing Systems, 2010. [ bib | .pdf | Abstract ]
[44] O. A. Maillard, R. Munos, A. Lazaric, and M. Ghavamzadeh. Finite sample analysis of Bellman residual minimization. In Masashi Sugiyama and Qiang Yang, editors, Asian Conference on Machine Learpning. JMLR: Workshop and Conference Proceedings, volume 13, pages 309-324, 2010. [ bib | .pdf | Abstract ]
[45] A. Lazaric, M. Ghavamzadeh, and R. Munos. Finite-sample analysis of LSTD. In International Conference on Machine Learning, pages 615-622, 2010. [ bib | .pdf | Abstract ]
[46] A. Lazaric, M. Ghavamzadeh, and R. Munos. Finite-sample analysis of LSTD. Technical report, INRIA-00482189, 2010. [ bib | http ]
[47] A. Lazaric, M. Ghavamzadeh, and R. Munos. Analysis of a classification-based policy iteration algorithm. In International Conference on Machine Learning, pages 607-614, 2010. [ bib | .pdf | Abstract ]
[48] A. Lazaric, M. Ghavamzadeh, and R. Munos. Analysis of a classification-based policy iteration algorithm. Technical report, INRIA-00482065, 2010. [ bib | http ]
[49] O. A. Maillard and R. Munos. Online learning in adversarial lipschitz environments. In European Conference on Machine Learning, 2010. [ bib | http | Abstract ]
[50] S. Bubeck and R. Munos. Open loop optimistic planning. In Conference on Learning Theory, 2010. [ bib | .pdf | Abstract ]
[51] J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In Conference on Learning Theory, 2010. [ bib | .pdf | Abstract ]
[52] R. Munos. Approximate dynamic programming. In Olivier Sigaud and Olivier Buffet, editors, Markov Decision Processes in Artificial Intelligence, chapter 3, pages 67-98. ISTE Ltd and John Wiley & Sons Inc, 2010. [ bib | .html | .pdf | Abstract ]
[53] J.-Y. Audibert, S. Bubeck, G. Chaslot, V. Danjean, S. Gelly, T. Hérault, J.-B. Hoock, C.-S. Lee, R. Munos, J. Pérez, A. Rimmel, M. Schoenauer, M. Sebag, O. Teytaud, M.-H. Wang, and Y. Wang. Gothique. Plein Sud, 72:110-115, 2009. [ bib | .pdf | Abstract ]
[54] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In Proc. of the 20th International Conference on Algorithmic Learning Theory, pages 23-37, 2009. [ bib | http | .pdf | Abstract ]
[55] J-Y. Audibert, R. Munos, and Cs Szepesvari. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410:1876-1902, 2009. [ bib | .pdf | Abstract ]
[56] O. Maillard and R. Munos. Compressed least squares regression. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1213-1221, 2009. [ bib | http | .pdf | Abstract ]
[57] P.A. Coquelin, R. Deguest, and R. Munos. Sensitivity analysis in HMMs with application to likelihood maximization. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 387-395, 2009. [ bib | .pdf | Abstract ]
[58] A. Lazaric and R. Munos. Hybrid stochastic-adversarial on-line learning. In Conference on Learning Theory, 2009. [ bib | http | .pdf | Abstract ]
[59] Sertan Girgin, Manuel Loth, Rémi Munos, Philippe Preux, and Daniil Ryabko, editors. Recent Advances in Reinforcement Learning. Springer. Lecture Notes in Artificial Intelligence 5323, 2009. [ bib | http | Abstract ]
[60] P.A. Coquelin, R. Deguest, and R. Munos. Particle filter-based policy gradient for pomdps. In Proceedings of Advances in Neural Information Processing Systems, volume 22. MIT Press, 2008. [ bib | .pdf | Abstract ]
[61] S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvári. Online optimization of X-armed bandits. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 22, pages 201-208. MIT Press, 2008. [ bib | http | .pdf | Abstract ]
[62] Y. Wang, J-Y. Audibert, and R. Munos. Infinitely many-armed bandits. In Proceedings of Advances in Neural Information Processing Systems, volume 22. MIT Press, 2008. [ bib | .pdf | Abstract ]
[63] J-F. Hren and R. Munos. Optimistic planning of deterministic systems. In European Workshop on Reinforcement Learning Springer LNAI 5323, editor, Recent Advances in Reinforcement Learning, pages 151-164, 2008. [ bib | .pdf | Abstract ]
[64] R. Maitrepierre, J. Mary, and R. Munos. Adaptative play in texas hold'em poker. In European Conference on Artificial Intelligence, 2008. [ bib | .pdf | Abstract ]
[65] R. Munos. Programmation dynamique avec approximation de la fonction valeur. In O. Sigaud and O. Buffet, editors, Processus décisionnels de Markov et intelligence artificielle, volume 2, chapter 11, pages 19-50. Hermes, 2008. [ bib | .html | .pdf | Abstract ]
[66] A. Antos, Cs. Szepesvari, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal, 71:89-129, 2008. [ bib | .pdf | Abstract ]
[67] R. Munos and Cs. Szepesvári. Finite time bounds for sampling based fitted value iteration. Journal of Machine Learning Research, 9:815-857, 2008. [ bib | http | .pdf | Abstract ]
[68] Rémi Munos. Performance bounds in Lp norm for approximate value iteration. SIAM J. Control and Optimization, 2007. [ bib | .ps ]
[69] R. Munos. Analyse en norme Lp de l'algorithme d'itérations sur les valeurs avec approximations. Revue d'Intelligence Artificielle, 21:55-76, 2007. [ bib | .pdf ]
[70] J-Y. Audibert, R. Munos, and Cs Szepesvari. Tuning bandit algorithms in stochastic environments. In International Conference on Algorithmic Learning Theory, 2007. [ bib | .pdf ]
[71] A. Antos, Cs. Szepesvari, and R. Munos. Value-iteration based fitted policy iteration: learning with a single trajectory. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. [ bib | .pdf ]
[72] A. Antos, Cs. Szepesvari, and R. Munos. Fitted Q-iteration in continuous action-space MDPs. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 9-16, Cambridge, MA, 2007. MIT Press. [ bib | http | .pdf ]
[73] J.-Y. Audibert, R. Munos, and Cs. Szepesvári. Variance estimates and exploration function in multi-armed bandit. Technical report, Certis - Ecole des Ponts, 2007. [ bib | .pdf ]
[74] P.-A. Coquelin, R. Deguest, and R. Munos. Numerical methods for sensitivity analysis of Feynman-Kac models. Technical report, INRIA, 2007. [ bib | http | .pdf ]
[75] P-A. Coquelin, S. Martin, and R. Munos. A dynamic programming approach to viability problems. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. [ bib | http | .pdf ]
[76] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. Technical report, INRIA RR-6141, 2007. [ bib | http | .pdf ]
[77] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Uncertainty in Artificial Intelligence, 2007. [ bib | .pdf ]
[78] S. Gelly and R. Munos. L'ordinateur, champion de go ? Pour la Science, 354:28-35, 2007. [ bib | http | .pdf ]
[79] R. Munos. Policy gradient in continuous time. Journal of Machine Learning Research, 7:771-791, 2006. [ bib | .pdf ]
[80] R. Munos. Geometric variance reduction in Markov chains. Application to value function and gradient estimation. Journal of Machine Learning Research, 7:413-427, 2006. [ bib | .pdf ]
[81] A. Antos, Cs. Szepesvari, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, Conference on Learning Theory, volume 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, 2006. Springer-Verlag. [ bib | .pdf ]
[82] J-Y. Audibert, R. Munos, and Cs Szepesvari. Use of variance estimation in the multi-armed bandit problem. In NIPS workshop On-line Trading of Exploration and Exploitation, 2006. [ bib | .pdf ]
[83] C. Barrera-Esteve, F. Bergeret, E. Gobet, A. Meziou, R. Munos, and D. Reboul-Salze. Numerical methods for the pricing of swing options: a stochastic control approach. Methodology and Computing in Applied Probability, 8:517-540, 2006. [ bib | .pdf ]
[84] O. Bokanowski, S. Martin, R. Munos, and H. Zidani. An anti-diffusive scheme for viability problems. Applied Numerical Mathematics, special issue on Numerical methods for viscosity solutions and applications, 45-9:1147-1162, 2006. [ bib | .ps ]
[85] S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of UCT with patterns in monte-carlo go. Technical report, INRIA RR-6062, 2006. [ bib | http | .pdf ]
[86] R. Munos and Cs. Szepesvári. Finite time bounds for sampling based fitted value iteration. Technical report, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, Budapest 1111, Hungary, 2006. [ bib | http ]
[87] Emmanuel Gobet and Rémi Munos. Sensitivity analysis using Itô Malliavin calculus and martingales. Application to stochastic optimal control. SIAM journal on Control and Optimization, 43(5):1676-1713, 2005. [ bib | .pdf ]
[88] Rémi Munos. Error bounds for approximate value iteration. Technical report, Ecole Polytechnique RR527, 2005. [ bib | .pdf ]
[89] R. Munos. Error bounds for approximate value iteration. In American Conference on Artificial Intelligence, 2005. [ bib | .pdf ]
[90] R. Munos. Geometric variance reduction in Markov chains. Application to value function and gradient estimation. In American Conference on Artificial Intelligence, 2005. [ bib | .pdf ]
[91] Rémi Munos and Hasna Zidani. Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations. C. R. Acad. Sci. Paris, Ser. I Math, 2005. [ bib | .ps ]
[92] Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In International Conference on Machine Learning, pages 881-886, 2005. [ bib | .ps ]
[93] R. Munos. Policy gradient in continuous time. In Conférence francophone sur l'apprentissage automatique, 2005. [ bib | .ps ]
[94] O. Bokanowski, S. Martin, R. Munos, and H. Zidani. An anti-diffusive scheme for viability problems. Technical report, INRIA, Research Report 5431, 2004. [ bib | http ]
[95] Emmanuel Gobet and Rémi Munos. Sensitivity analysis using Itô Malliavin calculus and martingales. Numerical implementation. Technical report, Ecole Polytechnique. Research Report 520, 2004. [ bib | .pdf ]
[96] Rémi Munos. Algorithme d'itération sur les politiques avec approximation linéaire. Journal Electronique d'Intelligence Artificielle, 4-37, 2004. [ bib | .ps ]
[97] C. Barrera-Esteve, F. Bergeret, C. Dossal, E. Gobet, A. Meziou, R. Munos, and D. Reboul-Salze. Numerical methods for the pricing of swing options: a stochastic control approach. Technical report, Ecole Polytechnique. Research Report 544, 2004. [ bib | .pdf ]
[98] Rémi Munos. Contributions à l'apprentissage par renforcement et au contrôle optimal avec approximation. PhD thesis, HDR de l'Université Pierre et Marie Curie, spécialité Mathématiques Appliquées, 2004. [ bib | .pdf ]
[99] R. Munos. Error bounds for approximate policy iteration. In International Conference on Machine Learning, pages 560-567, 2003. [ bib | .pdf ]
[100] Rémi Munos and Andrew Moore. Variable resolution discretization in optimal control. Machine Learning Journal, 49:291-323, 2002. [ bib | .ps ]
[101] Emmanuel Gobet and Rémi Munos. Sensitivity analysis using Itô Malliavin calculus and martingales. application to stochastic optimal control. Technical report, Ecole Polytechnique RR-498, 2002. [ bib | .ps ]
[102] Rémi Munos. Decision-making under uncertainty: Efficiently estimating where extra ressources are needed. Technical report, Ecole Polytechnique RR-550, 2002. [ bib | .ps ]
[103] Rémi Munos. Efficient resources allocation for Markov decision processes. In Advances in Neural Information Processing Systems, 2001. [ bib | .ps ]
[104] Rémi Munos and Andrew W. Moore. Rates of convergence for variable resolution schemes in optimal control. In International Conference on Machine Learning, 2000. [ bib | .ps ]
[105] Rémi Munos. A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40:265-299, 2000. [ bib | .ps ]
[106] Rémi Munos, Leemon Baird, and Andrew Moore. Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In International Joint Conference on Neural Networks, 1999. [ bib | .pdf ]
[107] Rémi Munos and Andrew Moore. Influence and variance of a Markov chain : Application to adaptive discretizations in optimal control. In Proceedings of the 38th IEEE Conference on Decision and Control, 1999. [ bib | .ps ]
[108] Rémi Munos and Andrew Moore. Variable resolution discretization for high-accuracy solutions of optimal control problems. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1348-1355, 1999. [ bib | .ps ]
[109] Rémi Munos. A general convergence theorem for reinforcement learning in the continuous case. In European Conference on Machine Learning, 1998. [ bib | .ps ]
[110] Rémi Munos and Andrew Moore. Barycentric interpolators for continuous space and time reinforcement learning. In Advances in Neural Information Processing Systems, volume 11, 1998. [ bib | .ps ]
[111] Rémi Munos. Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In European Conference on Machine Learning, pages 170-183, 1997. [ bib | .ps ]
[112] Rémi Munos. A convergent reinforcement learning algorithm in thecontinuous case based on a finite difference method. In International Joint Conference on Artificial Intelligence, 1997. [ bib | .ps ]
[113] Rémi Munos. Catégorisation adaptative de donnés sensori-motrices pour un système d'apprentissage par renforcement. In Journées de Rochebrune, rencontres interdisciplinaires sur les systèmes complexes naturels et artificiels, 1997. [ bib | .ps ]
[114] Rémi Munos. Apprentissage par Renforcement, étude du cas continu. PhD thesis, Ecole des Hautes Etudes en Sciences Sociales, 1997. [ bib | .ps ]
[115] Rémi Munos and Paul Bourgine. Reinforcement learning for continuous stochastic control problems. In Neural Information Processing Systems, 1997. [ bib | .ps ]
[116] Rémi Munos. A convergent reinforcement learning algorithm in the continuous case : the finite-element reinforcement learning. In International Conference on Machine Learning. Morgan Kaufmann, 1996. [ bib | .ps ]
[117] Rémi Munos. Using finite differences methods for approximating the value function of continuous reinforcement learning problems. In International Symposium on Multi-Technology Information Processing, 1996. [ bib | .pdf ]
[118] R. Munos and J. Patinel. Reinforcement learning with dynamic covering of the state-action space : Partitioning q-learning. In Simulation of Adaptive Behavior. The MIT Press/Bradford Books, 1994. [ bib ]

This file was generated by bibtex2html 1.97.