*****
Tasks
*****

For the TriFinger RL datasets we considered two dexterous manipulation tasks:

1. `Push` a cube to a target location on the ground. The goal is given as the Cartesian coordinates of the desired position of the cube's center.

.. image:: images/push_sequence.png

2. `Lift` a cube to match a target pose (position and orientation) in the air. The desired object pose is given in the form of keypoints. We provide a function :func:`~trifinger_rl_datasets.utils.get_pose_from_keypoints` for converting keypoints to a position and a quaternion representing the orientation.

.. image:: images/lift_sequence.png


The Lift task is considerably harder than the Push task as it requires first
flipping the cube to an approximately correct orientation before acquiring a stable
grasp and lifting it to match the goal pose. Both tasks are available in a 
:doc:`simulated <simulation/index>` environment and on the :doc:`real robot 
<real_robot/index>`.

The Cube
========

The object that is manipulated is a cube:

.. image:: images/cube_v2.jpg

It has a side width of 6.5 cm and weighs about 94 g.
The surface of the cube has some structure to make grasping it easier.  Each
side has a different color to help the camera-based object tracking. 

.. _observation_and_action_space:

Observation and Action Space
============================

The **observation space** can be chosen to be either a flat Box space or a nested
Dict space. An observation in the form of a dictionary has the following
structure:

- robot_observation
   - position
   - velocity
   - torque
   - fingertip_force
   - fingertip_position
   - fingertip_velocity
   - robot_id
- camera_observation
   - object_position
   - object_orientation
   - object_keypoints
   - delay
   - confidence
   - images (only if camera images are present in the dataset, see :doc:`/datasets/index`)
- **action**  -- *This contains the action from the previous step*
- desired_goal
   - object_position :superscript:`1`
   - object_keypoints :superscript:`2`
- achieved_goal
   - object_position :superscript:`1`
   - object_keypoints :superscript:`2`

| :superscript:`1` Only for the Push Task
| :superscript:`2` Only for the Lift Task

``robot_observation`` contains proprioceptive information about the robot, i.e., joint angles, 
angular velocities, torques at the joints, forces measured at the fingertips, the Cartesian
coordinates of the fingertips and their velocities, and the robot ID.

``camera_observation`` contains the camera images (if they are present in the environment/dataset),
and the estimate of the object pose obtained from the tracking system. ``object_orientation`` is a 
quaternion encoding the orientation of the object. ``object_keypoints`` refers to the Cartesian
coordinates of the 8 corners of the cube. The item therefore encodes both position and orientation.
``delay`` refers to how much time passed between the recording of the camera images and when the 
images and the corresponding pose estimate was provided in the observation (in seconds).
``confidence`` contains the confidence of the pose estimate as a value between 0.0 and 1.0 (bigger is
better).

In the real environment, the robot_id contains the ID of the robot on which the policy
was executed.  In the simulated environment the robot_id is always set to 0.
We furthermore simulate the delay of the object tracking present in the real system 
with values between 0.09s and 0.18s. However, the confidence has a fixed value of 1.0
in simulation.

The ``desired_goal`` is sampled at the beginning of each episode and is fixed for the
duration of the episode. The ``achieved_goal`` contains the goal that is currently achieved.

The **action space** is 9 dimensional. The actions are the torques send to the actuators
in the 3 joints of the 3 fingers. The torque range is [-0.397, 0.397] (corresponding to Nm).

Rewards and Success
===================

The **reward** is computed by applying a logistic kernel to
an input :math:`x`:

.. math::
   k(x)=\frac{b+2}{\exp(a\lVert x\rVert) + b + \exp(-a\lVert x\rVert)}

where :math:`\lVert \cdot \rVert` denotes the Euclidean norm. For
different values of :math:`x`, the function has the following form
(the shaded grey area corresponds to a cube centered at the goal and
the green area corresponds to the goal achievement threshold (2 cm)):

.. image:: images/reward_function.png
   :width: 100%

In the pushing task, :math:`x` is the difference between the achieved and
the desired goal position. In the lifting task, :math:`x` is the difference
between the achieved and the desired keypoints and the reward is obtained
by averaging over keypoints.

The tolerance for goal achievement (i.e. success) is 2 cm for the
position and 22 degrees for the orientation.