Space shuttle

Description of the task : ``Space-shuttle''

The state space :

The space-shuttle problem is defined by the 4-dimensional state x=(x1, x2, x1_dot, x2_dot)
where (x1, x2) are the coordinates of the space-shuttle and (x1_dot, x2_dot) their velocities.

The control space :

The control u takes 5 possible values corresponding to :

u = 0 means no thrust

u = 1 means thrust to the right

u = 2 means thrust to the left

u = 3 means thrust to the top

u = 4 means thrust to the bottom

The state dynamics :

The state dynamics is defined by the differential equation :
dx
-- = f(x,u)
dt
in the deterministic case, and by the stochastic diff. eq. :
dx = f(x,u) dt + s(x,u) dw
in the stochastic case. Here dw is a Wiener process.
(The formulation of the stochasticity is explained is the
function space_noise).

The reinforcement functions :

The current and terminal reinforcement functions are described
in the functions :

space_current_reinf

space_terminal_reinf

Code in C :

/* Definition of some constants */
#define GRAVITY 9.81
#define m 0.1   /* mass of the space shuttle */
#define M1 2   /* mass of planet 1 */
#define R1 0.6   /* radius of planet 1 */
#define X1 4.0 /* coordinates of planet 1 */
#define Y1 2.0
#define M2 1.5   /* mass of planet 2 */
#define R2 2   /* radius of planet 2 */
#define X2 -1.0 /* coordinates of planet 2 */
#define Y2 -4.0
#define Gx -5 /* coordinates of the Goal */
#define Gy 5

/* This is the state dynamics.
Inputs :

tsk is the task (here Space-shuttle)
state is the current (4d here) state
action is the control (+ or - max_u)

Output :

f is the state dynamics vector (4d here)

*/
void space_f(task *tsk, double *state, double action, double *f)
{

double k=0.08;
double x=state[0];
double y=state[1];
double vx=state[2];
double vy=state[3];

double a_to_ax[5] = {0, 1, -1, 0, 0};
double a_to_ay[5] = {0, 0, 0, 1, -1};

double ax=a_to_ax[action];
double ay=a_to_ay[action];

double fx, fy;

double dx1=(X1-x), dy1=(Y1-y);
double dx2=(X2-x), dy2=(Y2-y);
double d1,d2;

d1=sqrt(dx1*dx1+dy1*dy1);
d1=d1*d1*d1;

d2=sqrt(dx2*dx2+dy2*dy2);
if (d2 > R2) d2=d2*d2*d2;
else d2=R2*R2*R2;

fx = GRAVITY * m * (M1 * dx1 / d1 + M2 * dx2 / d2);
fy = GRAVITY * m * (M1 * dy1 / d1 + M2 * dy2 / d2);

f[0]=vx;
f[1]=vy;
f[2]=fx + k*ax;
f[3]=fy + k*ay;
}

/* This is the stochastic part.
The stochastic differential equation :
dx = f(x,u) dt + s(x,u) dw
includes 2 part :

the local drift : f(x,u) dt which is the deterministic part
the noise : s(x,u) dw.

Let the matrix a=s.s' (where ' means the transpose)
Then a is a 4x4 symetric positive matrix, thus there exists a set
of 4 orthogonal eigenvectors e1, e2, e3 and e4 with eigenvalues l1 l2, l3 and l4.
This function takes as input :

the (4d) state and action and

and returns the output :

the corresponding eigenvalues and eigenvectors.

For example,
- if there is no noise (deterministic process) just return l1 = l2 = l3 = l4 = 0.
*/
void space_noise(task *tsk, double *state, double action, double *eig_val, double **eig_vect)
{
/* In this example, there is no noise :
the eigenvalues and the eigenvectors are :
l1= 0.0 for e1=( 1 0 0 0)
l2= 0.0 for e2=( 0 1 0 0)
l3= 0.0 for e3=( 0 0 1 0 )
l4= 0.0 for e4=( 0 0 0 1 )
*/
eig_val[0]=0.0;
eig_val[1]=0.0;
eig_val[2]=0.0;
eig_val[3]=0.0;

eig_vect[0][0]=1;
eig_vect[1][0]=0;
eig_vect[2][0]=0;
eig_vect[3][0]=0;

eig_vect[0][1]=0;
eig_vect[1][1]=1;
eig_vect[2][1]=0;
eig_vect[3][1]=0;

eig_vect[0][2]=0;
eig_vect[1][2]=0;
eig_vect[2][2]=1;
eig_vect[3][2]=0;

eig_vect[0][3]=0;
eig_vect[1][3]=0;
eig_vect[2][3]=0;
eig_vect[3][3]=1;
}

/* The current reinforcement penalizes the consumption of fuel (action != 0) */
double space_current_reinf(task *tsk, double *state, double action, double seconds)
{
double cost;
if (action > 0) cost += 0.01 * seconds;
return -cost;
}

/* The terminal reinforcement :
If the systems has reached the goal then R=+10, otherwise (R=-10) if the space-shuttle touches the planet1 or exits from the state space.
*/
double space_terminal_reinf(task *tsk, double *state)
{
if (in_goal(tsk, state)) return 10.0;
else return -10.0;
}

Some numerical values :

State space :

x1 and x2 are bounded by : -10, +10
x1_dot and x2_dot are bounded by : -1, +1

Action space :

there are 5 actions.

The Goal is defined by :

x1 = -5 + or - 0.5
x2 = 5 + or - 0.5
x1_dot = 0 + or - 1.0
x2_dor = 0 + or - 1.0

The discount factor : gamma = 0.97

Description of the boundaries :

the boundary related to the first dimension (x1) is a normal boundary.
the boundary related to the second dimension (x2) is a normal boundary.
the boundary related to the third dimension (x1_dot) is a bounding boundary (the velocity is bounded to stay in the range [-1, +1]).
the boundary related to the third dimension (x2_dot) is a bounding boundary (the velocity is bounded to stay in the range [-1, +1]).

Timestep for the integration (Runge Kutta) of the trajectories : 0.08