Description of the task
: ``Car on the Hill''
The state space :
The car is defined by the 2-dimensional state x=(y,v)
The control space :
where y is its position and v its velocity.
The control u is a 1-dimensional variable that takes a
The state dynamics :
finite number of possible values between u_min and u_max.
It represents the thrust applied to the car.
The state dynamics is defined by the differential equation
The reinforcement functions :
-- = f(x,u)
in the deterministic case, and by the stochastic diff. eq. :
dx = f(x,u) dt + s(x,u) dw
in the stochastic case. Here dw is a Wiener process.
(The formulation of the stochasticity is explained is the
The current and terminal reinforcement functions are described
Code in C :
in the functions :
/* Definition of some constants and basic functions
#define A 1.0
#define B 5.0
#define C 0.0
#define MASS 1.0
#define GRAVITY 9.81
#define f1(x) ((x) * ((x) + 1.0))
#define f1_dashed2(x) (2.0 * (x) + 1.0)
#define f2(x) (A * (x) / sqrt(1.0 + B * (x) * (x)))
#define f(x) (((x) < C) ? f1(x):f2(x))
#define f_dashed(x) (((x) < C) ? f1_dashed2(x) : f2_dashed2(x))
double f2_dashed2(double x)
double alpha = sqrt(1.0 + B * x * x);
return( A / (alpha * alpha * alpha) );
/* This is the state dynamics.
tsk is the task (here HillCar)
state is the current (2d here) state
action is the control (+4 or -4)
f is the state dynamics vector (2d here)
void hillcar_f(task *tsk, double *state,
double action, double *f)
double u, x, q, p, acc;
x = state;
q = f_dashed(x);
p = 1.0 + (q * q);
acc = (action / (MASS * sqrt(p))) - (GRAVITY * q / p);
f = state;
f = acc;
/* This is the stochastic part.
The stochastic differential equation :
dx = f(x,u) dt + s(x,u) dw
includes 2 part :
Let the matrix a=s.s' (where ' means the transpose)
the local drift : f(x,u) dt which is the deterministic
the noise : s(x,u) dw.
Then a is a 2x2 symetric positive matrix,
thus there exists a set
of 2 orthogonal eigenvectors e1 and e2 with eigenvalues
l1 and l2.
This function takes as input :
and returns the output :
the (2d) state and action and
the corresponding eigenvalues and eigenvectors.
- if there is no noise (deterministic process)
just return l1 = l2 = 0.
void hillcar_noise(task *tsk, double *state,
double *eig_val, double **eig_vect)
/* In this example, the noise is constant (does
not depend on
the state or the action) and is as follow :
the eigenvalues and the eigenvectors are :
l1= 0.03 for e1=( 1 0 )
l2= 0.3 for e2=( 0 1 )
/* The current reinforcement (here 0 everywhere)
double hillcar_current_reinf(task *tsk,
double *state, double action, double seconds)
/* The terminal reinforcement : this function
is called only when the systems exits from the state space.
+1 if the car exits from the left side of the state
from -1 to +1 (linearly) when the car exits from
the right side, depending on its velocity
(-1 for a null velocity, +1 for the max. velocity)
double hillcar_terminal_reinf(task *tsk,
if (y<0) y=-y;
if (x<0) return -1;
return 1.0- y/2.0;
Some numerical values :
the position is bounded by -1, +1
the velocity is bounded by -4, +4
Action space :
there are 2 actions : -4 or +4
The discount factor : gamma = 0.6
Description of the boundaries :
the boundary related to the first dimension (the position of the car) is
the boundary related to the second dimension (the velocity) is a bounding
boundary (the velocity is bounded to stay in the range [-4, +4]).
Timestep for the integration (Runge Kutta) of the trajectories : 0.002