Description of the task : ``Cart-Pole''


 

The state space :

The Cartpole is defined by the 4-dimensional state x=(theta, y, theta_dot, y_dot)
where theta is the the angles of the pole, y the position of the cart (see figure) and (theta_dot, y_dot) are their respective velocities.
The control space :
The control u is a 1-dimensional variable that takes a
finite number of possible values between -max_u and +max_u.
It represents the thrust applied to the cart.
The state dynamics :
The state dynamics is defined by the differential equation :
dx
-- = f(x,u)
dt
in the deterministic case, and by the stochastic diff. eq. :

dx = f(x,u) dt + s(x,u) dw

in the stochastic case. Here dw is a Wiener process.
(The formulation of the stochasticity is explained is the
function Cartpole_noise).

The reinforcement functions :
The current and terminal reinforcement functions are described
in the functions :
Code in C :
 

/* Definition of some constants and basic functions */
#define CART_MASS 1.0
#define POLE_MASS 0.1
#define POLE_LENGTH 1.0

/* This is the state dynamics.
Inputs :

Output : */
void Cartpole_f(task *tsk, double *state, double action, double *f)
{
  double x_double_dot, theta_double_dot;
  double theta, theta_dot, u;
 

  theta = state[0];
  theta_dot = state[2];
  u = action;

  theta_double_dot = (((CART_MASS + POLE_MASS) * 9.8 * sin(theta)  -
           (u + POLE_MASS * POLE_LENGTH * theta_dot * sin(theta))*cos(theta)) /
         ((4.0 / 3.0) * (CART_MASS + POLE_MASS) * POLE_LENGTH -
          (POLE_MASS * POLE_LENGTH * pow(cos(theta), 2))));

  x_double_dot = ((pow(theta_dot, 2) * sin(theta) - theta_double_dot *sin(theta))
         * POLE_MASS * POLE_LENGTH + u) / (POLE_MASS + CART_MASS);

f[0]=theta_dot;
f[1]=state[3];
f[2]=theta_double_dot;
f[3]=x_double_dot;
}

/* This is the stochastic part.
The stochastic differential equation :
dx = f(x,u) dt + s(x,u) dw
includes 2 part :

Let the matrix a=s.s' (where ' means the transpose)
Then a is a 4x4 symetric positive matrix, thus there exists a set
of 4 orthogonal eigenvectors e1, e2, e3 and e4 with eigenvalues l1 l2, l3 and l4.
This function takes as input : and returns the output : For example,
- if there is no noise (deterministic process) just return l1 = l2 = l3 = l4 = 0.
*/
void Cartpole_noise(task *tsk, double *state, double action, double *eig_val, double **eig_vect)
{
/* In this example, the noise is zero (does not depend on
the state or the action) and is as follow :
the eigenvalues and the eigenvectors are :
l1= 0.0 for e1=( 1 0 0 0)
l2= 0.0 for e2=( 0 1 0 0)
l3= 0.0 for e3=( 0 0 1 0 )
l4= 0.0 for e4=( 0 0 0 1 )
*/
eig_val[0]=0.0;
eig_val[1]=0.0;
eig_val[2]=0.0;
eig_val[3]=0.0;

eig_vect[0][0]=1;
eig_vect[1][0]=0;
eig_vect[2][0]=0;
eig_vect[3][0]=0;

eig_vect[0][1]=0;
eig_vect[1][1]=1;
eig_vect[2][1]=0;
eig_vect[3][1]=0;

eig_vect[0][2]=0;
eig_vect[1][2]=0;
eig_vect[2][2]=1;
eig_vect[3][2]=0;

eig_vect[0][3]=0;
eig_vect[1][3]=0;
eig_vect[2][3]=0;
eig_vect[3][3]=1;
}
 

/* The current reinforcement (here 0 everywhere) */
double Cartpole_current_reinf(task *tsk, double *state, double action, double seconds)
{
return 0;
}

/* The terminal reinforcement : this function is called only when the systems exits from the state space.
    If the systems has reached the goal then R=+1, otherwise R=-1.
*/
double Cartpole_terminal_reinf(task *tsk, double *state)
{
  if (in_goal(tsk, state) return 1.0;
  else return -1.0;
}
 
 

Some numerical values :