**The state space :**

The car is defined by the 2-dimensional state x=(y,v)

where y is its position and v its velocity.

The control u is a 1-dimensional variable that takes a

finite number of possible values between u_min and u_max.

It represents the thrust applied to the car.

The state dynamics is defined by the differential equation :

dx

-- = f(x,u)

dt

in the deterministic case, and by the stochastic diff. eq. :dx = f(x,u) dt + s(x,u) dw

in the stochastic case. Here dw is a Wiener process.

(The formulation of the stochasticity is explained is the

function hillcar_noise).

The current and terminal reinforcement functions are described

in the functions :

- hillcar_current_reinf
- hillcar_terminal_reinf

/* Definition of some constants and basic functions
*/

#define A 1.0

#define B 5.0

#define C 0.0

#define MASS 1.0

#define GRAVITY 9.81

#define f1(x) ((x) * ((x) + 1.0))

#define f1_dashed2(x) (2.0 * (x) + 1.0)

#define f2(x) (A * (x) / sqrt(1.0 + B * (x) * (x)))

#define f(x) (((x) < C) ? f1(x):f2(x))

#define f_dashed(x) (((x) < C) ? f1_dashed2(x) : f2_dashed2(x))

double f2_dashed2(double x)

{

double alpha = sqrt(1.0 + B * x * x);

return( A / (alpha * alpha * alpha) );

}

/* This is the state dynamics.
**Inputs :**

- tsk is the task (here HillCar)
- state is the current (2d here) state
- action is the control (+4 or -4)

- f is the state dynamics vector (2d here)

void hillcar_f(task *tsk, double *state, double action, double *f)

{

double u, x, q, p, acc;

x = state[0];

q = f_dashed(x);

p = 1.0 + (q * q);

acc = (action / (MASS * sqrt(p))) - (GRAVITY * q / p);

f[0] = state[1];

f[1] = acc;

}

/* This is the **stochastic part.**

The stochastic differential equation :

dx = f(x,u) dt + s(x,u) dw

includes 2 part :

- the local drift : f(x,u) dt which is the deterministic partt
- the noise : s(x,u) dw.

Then

of 2 orthogonal eigenvectors e1 and e2 with eigenvalues l1 and l2.

This function takes as

- the (2d) state and action and

- the corresponding eigenvalues and eigenvectors.

- if there is no noise (deterministic process) just return l1 = l2 = 0.

*/

void hillcar_noise(task *tsk, double *state, double action,

double *eig_val, double **eig_vect)

{

/* In this example, the noise is constant (does not depend on

the state or the action) and is as follow :

the eigenvalues and the eigenvectors are :

l1= 0.03 for e1=( 1 0 )

l2= 0.3 for e2=( 0 1 )

*/

eig_val[0]=0.03;

eig_val[1]=0.3;

eig_vect[0][0]=1;

eig_vect[1][0]=0;

eig_vect[0][1]=0;

eig_vect[1][1]=1;

}

/* The **current reinforcement** (here 0 everywhere)
*/

double hillcar_current_reinf(task *tsk,
double *state, double action, double seconds)

{

return 0;

}

/* The **terminal reinforcement **: this function
is called only when the systems exits from the state space.

- +1 if the car exits from the left side of the state space,
- from -1 to +1 (linearly) when the car exits from the right side, depending on its velocity

(-1 for a null velocity, +1 for the max. velocity)

double hillcar_terminal_reinf(task *tsk, double *state)

{

double x=state[0];

double y=state[1];

if (y<0) y=-y;

if (x<0) return -1;

return 1.0- y/2.0;

}

Some numerical values :

- State space :

- the position is bounded by -1, +1
- the velocity is bounded by -4, +4
- Action space :
- there are 2 actions : -4 or +4

- The discount factor : gamma = 0.6

- Description of the boundaries :
- the boundary related to the first dimension (the position of the car) is normal
- the boundary related to the second dimension (the velocity) is a bounding boundary (the velocity is bounded to stay in the range [-4, +4]).
- Timestep for the integration (Runge Kutta) of the trajectories : 0.002