Michal Valko : Call for PhD students, 2018
We are seeking candidates for the following position at SequeL/Inria Lille. We are in the process of interviewing the candidates already, so please get in touch with us as early as possible.

Sequential Learning in Dynamic Environments, SequeL team, Inria, Lille, France (PhD position)

Keywords: multi-arm bandits, stochastic optimization, reinforcement learning, monte-carlo tree search, planning, changing and non-stationary environments

This Ph.D. program is focused on sequential learning in structured and dynamic environments. The key aspect of this problem is that relatively little knowledge of the environment is available beforehand, and the learner (a virtual or physical agent) has to sequentially interact with environment to learn its structure and then act optimally. This problem encompasses a wide range of applications depending on the definition of the structure of the environment, the sequential nature of the interaction between learner and environment, and the type of dynamics and evolution. In particular, this Ph.D. program will be driven by two application domains of scientific and societal importance: planning in games and in random environments. In both setups, the paradigm of reinforcement learning, in which the learner collects rewards to guide his/her actions, will be useful. In many cases the dynamic of the sequential interaction is particularly complex, since rewards keep changing over time and the set of actions (e.g., items that can be recommended in a recommender systems) continuously evolve (products may be added or removed or change characteristics). Finally, both applications usually involve large spaces which asks for efficient learning algorithms with minimal time, space, and sample complexity. In general, a successful sequential learning strategy should efficiently allocate the limited resources to exploitation (making the best decision based on our current, but possibly imperfect, knowledge) and to exploration (decisions that may appear sub-optimal but which may reduce the uncertainty and, as a result, could improve the relevance of future decisions).


The main mathematical models of reference to deal with the problem of sequential learning are multi-armed bandits and reinforcement learning. The multi-armed bandit problem captures the essence of the exploration-exploitation trade-off in an unknown environment where only noisy observations of the performance of a set of actions, called arms (e.g., the preference of a user for a product) are available. The objective is to identify the arm with the best performance. On the other hand, reinforcement learning formalizes the more general problem of decision-making under uncertainty when the state of the environment evolves in response to the action taken by the learner.


Many complex autonomous systems (e.g., electrical distribution networks, or smart grids) repeatedly select actions with the aim of achieving a given objective. Reinforcement learning (RL) offers a powerful framework for acquiring adaptive behavior in this setting, associating a scalar reward with each action and learning from experience which action to select to maximize long-term reward. Although RL has produced impressive results recently (e.g., achieving human-level play in Atari games and beating the human world champion in the board game Go), most existing solutions only work under strong assumptions: the environment model is stationary, the objective is fixed, and trials end once the objective is met. The aim of this project is to advance the state of the art of fundamental research in lifelong RL by developing several novel RL algorithms that relax the above assumptions. The new algorithms should be robust to environmental changes, both in terms of the observations that the system can make and the actions that the system can perform. Moreover, the algorithms should be able to operate over long periods of time while achieving different objectives. The proposed algorithms will address three key problems related to lifelong RL: planning, exploration, and task decomposition. Planning is the problem of computing an action selection strategy given a (possibly partial) model of the task at hand. Exploration is the problem of selecting actions with the aim of mapping out the environment rather than achieving a particular objective. Task decomposition is the problem of defining different objectives and assigning a separate action selection strategy to each. The algorithms will be evaluated in two realistic scenarios: active network management for electrical distribution networks, and microgrid management. A test protocol will be developed to evaluate each individual algorithm, as well as their combinations.

Job Description:

The PhD candidate will focus on one or more issues related to sequential learning in structured and evolving problems. The PhD candidate will first acquire expertise in different topics of machine learning such as online learning, multi-armed bandit, statistical learning theory, reinforcement learning, approximate dynamic programming, and algorithmic game theory. Then, the PhD candidate is expected to contribute to the advancement of the literature on this problem along many different lines: methodological (e.g., definition of general abstract models for a wide range of decision-making problems), theoretical (e.g., near optimality performance guarantees), and algorithmic (e.g., development of novel algorithms for specific decision-making problems). The research activity of the PhD candidate will be closely related to EU Chistera Delta project (http://www.chistera.eu/projects/delta). This will allow the PhD candidate to develop collaborations with other researchers participating in this research project and it may also allow him/her to spend part of his research activity at partner laboratories such as Montanuniversitat Leoben (Austria), Universita Pompeu Fabra (Spain) or Université de Liège (Belgium). Possibility of internships in the industry research labs such as: Adobe Research in California or DeepMind in UK. The starting date of PhD program is flexible.


The applicant must have a Master of Science in Computer Science, Statistics, Math, or related fields, possibly with background in reinforcement learning, bandits, or optimization. The working language in the lab is English, good written and oral communication skills are required.


We encourage the applicants to contact us immediately at least with a CV. The full application should include a brief description of research interests and past experience, a CV, degrees and grades, a copy of Master thesis (or a draft thereof), motivation letter (short but pertinent to this call), relevant publications, and other relevant documents. Candidates are encouraged to provide letter(s) of recommendation and contact information to reference persons. Please send your application in one single pdf to emilie.kaufmann-at-univ-lille1.fr and michal.valko-at-inria.fr.

  • Application requested until position is filled
  • Duration: 3 years (a full time position)
  • Starting date: October 15, 2018 (negotiable)
  • Supervisors: Emilie Kaufmann, and Michal Valko
  • Place: SequeL, Inria Lille - Nord Europe

Working Environment:

The PhD candidate will work at SequeL (https://sequel.lille.inria.fr/) lab at Inria Lille - Nord Europe located in Lille. Inria (http://www.inria.fr/) is France's leading institution in Computer Science, with over 2800 scientists employed, of which around 250 in Lille. Lille is the capital of the north of France, a metropolis with 1 million inhabitants, with excellent train connection to Brussels (30 min), Paris (1h) and London (1h30). The SequeL lab is a dynamic lab at Inria with over 20 researchers (including PhD students) which covers several aspects of machine learning from theory to applications, including statistical learning, reinforcement learning, and sequential learning.


  • Duration: 36 months – starting date of the contract : October 2018, 15th (negotiable)
  • Salary: €1982 the first two years and €2085 the third year (before taxes, taxes are around €380)
  • Possibility of French courses
  • Help for housing
  • Participation for public transport
  • Scientific Resident card and help for husband/wife visa
This call is posted at http://researchers.lille.inria.fr/~valko/hp/call-phd-2018 with the most up-to-date information.