Car on the Hill

Description of the task : ``Car on the Hill''

The state space :

The car is defined by the 2-dimensional state x=(y,v)
where y is its position and v its velocity.

The control space :

The control u is a 1-dimensional variable that takes a
finite number of possible values between u_min and u_max.
It represents the thrust applied to the car.

The state dynamics :

The state dynamics is defined by the differential equation :
dx
-- = f(x,u)
dt
in the deterministic case, and by the stochastic diff. eq. :
dx = f(x,u) dt + s(x,u) dw
in the stochastic case. Here dw is a Wiener process.
(The formulation of the stochasticity is explained is the
function hillcar_noise).

The reinforcement functions :

The current and terminal reinforcement functions are described
in the functions :

hillcar_current_reinf

hillcar_terminal_reinf

Code in C :

/* Definition of some constants and basic functions */
#define A 1.0
#define B 5.0
#define C 0.0
#define MASS 1.0
#define GRAVITY 9.81
#define f1(x) ((x) * ((x) + 1.0))
#define f1_dashed2(x) (2.0 * (x) + 1.0)
#define f2(x) (A * (x) / sqrt(1.0 + B * (x) * (x)))
#define f(x) (((x) < C) ? f1(x):f2(x))
#define f_dashed(x) (((x) < C) ? f1_dashed2(x) : f2_dashed2(x))

double f2_dashed2(double x)
{
double alpha = sqrt(1.0 + B * x * x);
return( A / (alpha * alpha * alpha) );
}

/* This is the state dynamics.
Inputs :

tsk is the task (here HillCar)
state is the current (2d here) state
action is the control (+4 or -4)

Output :

f is the state dynamics vector (2d here)

*/
void hillcar_f(task *tsk, double *state, double action, double *f)
{
double u, x, q, p, acc;

x = state[0];
q = f_dashed(x);
p = 1.0 + (q * q);
acc = (action / (MASS * sqrt(p))) - (GRAVITY * q / p);

f[0] = state[1];
f[1] = acc;
}

/* This is the stochastic part.
The stochastic differential equation :
dx = f(x,u) dt + s(x,u) dw
includes 2 part :

the local drift : f(x,u) dt which is the deterministic partt
the noise : s(x,u) dw.

Let the matrix a=s.s' (where ' means the transpose)
Then a is a 2x2 symetric positive matrix, thus there exists a set
of 2 orthogonal eigenvectors e1 and e2 with eigenvalues l1 and l2.
This function takes as input :

the (2d) state and action and

and returns the output :

the corresponding eigenvalues and eigenvectors.

For example,
- if there is no noise (deterministic process) just return l1 = l2 = 0.
*/
void hillcar_noise(task *tsk, double *state, double action,
double *eig_val, double **eig_vect)
{
/* In this example, the noise is constant (does not depend on
the state or the action) and is as follow :
the eigenvalues and the eigenvectors are :
l1= 0.03 for e1=( 1 0 )
l2= 0.3 for e2=( 0 1 )
*/
eig_val[0]=0.03;
eig_val[1]=0.3;

eig_vect[0][0]=1;
eig_vect[1][0]=0;

eig_vect[0][1]=0;
eig_vect[1][1]=1;
}

/* The current reinforcement (here 0 everywhere) */
double hillcar_current_reinf(task *tsk, double *state, double action, double seconds)
{
return 0;
}

/* The terminal reinforcement : this function is called only when the systems exits from the state space.

+1 if the car exits from the left side of the state space,
from -1 to +1 (linearly) when the car exits from the right side, depending on its velocity

(-1 for a null velocity, +1 for the max. velocity)

*/
double hillcar_terminal_reinf(task *tsk, double *state)
{
double x=state[0];
double y=state[1];
if (y<0) y=-y;
if (x<0) return -1;
return 1.0- y/2.0;
}

Some numerical values :

State space :

the position is bounded by -1, +1
the velocity is bounded by -4, +4

Action space :

there are 2 actions : -4 or +4

The discount factor : gamma = 0.6

Description of the boundaries :

the boundary related to the first dimension (the position of the car) is normal
the boundary related to the second dimension (the velocity) is a bounding boundary (the velocity is bounded to stay in the range [-4, +4]).

Timestep for the integration (Runge Kutta) of the trajectories : 0.002