Variable Resolution Discretization
in Optimal Control (page 3)
Remi Munos
Andrew Moore
4- A global splitting heuristic
:
-
We define the notion of influence, which gives a measures of how
much a state "contributes" to the value function of another state.
-
The influence is easy to compute : it is equivalent to computing the value
function for a Markov chain.
-
Example, for the "Car on the Hill", here is the influence of the whole
state space on 3 points (the crosses) indicated by grey levels:
Influence on 3 points (the crosses)
-
Actually, we are interested in finding out the states which influence the
areas of change in the optimal control.
-
So since we know the cells of policy disagreement:
-
we can compute the influence on this subset. We obtain:
-
Thus the states of highest influence on the areas of change in the optimal
control are the states that need to be accurately approximated in order
to define a good controller.
-
In order to get an estimation of the quality of approximation of the value
function for a given discretization, we define the notion of variance.
-
The variance is easily computed since it satisfies a Bellman equation.
-
The variance gives an estimation of the bias or error of approximation
generated by the grid-approximation of the value function.
-
For the "Car on the Hill" we obtain the following standard deviation:
-
We observe that the standart deviation is high around the discontinuities
of V and its gradient.
-
We have seen that :
-
The states of highest influence on the cells of policy disagreement are
the states whose value function affects the most the optimal controller,
-
The states of highest standard deviation are the states of highest uncertainty
on the quality of approximation of the value function, thus the states
that could improve the most their approximation accuracy when split,
-
We derive the following global splitting heuristic :
-
Split the states of highest standard deviation that have an influence on
the cells of policy disagreement.
-
The product of the standard deviation by the influence on the cells of
policy disagreement is:
-
when we split the cells according to this measure, we obtain the following
discretization:
A global splitting heuristic
-
We notice that the discretization is only refined at places that are important
for designing an accurate controller.
-
This global splitting heuristic gives better results than local splitting
methods described earlier, and much better results than uniform grids.
-
In higher dimensional state-spaces, this approach is crutial since it becomes
completely intractable to approximate discontinuities of the value function.
-
Example of higher-dimentional control problems :
-
The cart-pole (or pole balancing) problem
-
The Acrobot
-
The airplane at constant altitude and velocity
-
The space-shuttle