Variable resolution discretizations in optimal control

Variable Resolution Discretization in Optimal Control (page 3)

Remi Munos
Andrew Moore

4- A global splitting heuristic :

We define the notion of influence, which gives a measures of how much a state "contributes" to the value function of another state.

The influence is easy to compute : it is equivalent to computing the value function for a Markov chain.

Example, for the "Car on the Hill", here is the influence of the whole state space on 3 points (the crosses) indicated by grey levels:

Influence on 3 points (the crosses)

Actually, we are interested in finding out the states which influence the areas of change in the optimal control.

So since we know the cells of policy disagreement:

we can compute the influence on this subset. We obtain:

Thus the states of highest influence on the areas of change in the optimal control are the states that need to be accurately approximated in order to define a good controller.

In order to get an estimation of the quality of approximation of the value function for a given discretization, we define the notion of variance.

The variance is easily computed since it satisfies a Bellman equation.

The variance gives an estimation of the bias or error of approximation generated by the grid-approximation of the value function.

For the "Car on the Hill" we obtain the following standard deviation:

We observe that the standart deviation is high around the discontinuities of V and its gradient.

We have seen that :

The states of highest influence on the cells of policy disagreement are the states whose value function affects the most the optimal controller,
The states of highest standard deviation are the states of highest uncertainty on the quality of approximation of the value function, thus the states that could improve the most their approximation accuracy when split,

We derive the following global splitting heuristic :

Split the states of highest standard deviation that have an influence on the cells of policy disagreement.

The product of the standard deviation by the influence on the cells of policy disagreement is:

when we split the cells according to this measure, we obtain the following discretization:

A global splitting heuristic

We notice that the discretization is only refined at places that are important for designing an accurate controller.

This global splitting heuristic gives better results than local splitting methods described earlier, and much better results than uniform grids.

In higher dimensional state-spaces, this approach is crutial since it becomes completely intractable to approximate discontinuities of the value function.

Example of higher-dimentional control problems :

The cart-pole (or pole balancing) problem
The Acrobot
The airplane at constant altitude and velocity
The space-shuttle