***** Tasks ***** For the TriFinger RL datasets we considered two dexterous manipulation tasks: 1. `Push` a cube to a target location on the ground. The goal is given as the Cartesian coordinates of the desired position of the cube's center. .. image:: images/push_sequence.png 2. `Lift` a cube to match a target pose (position and orientation) in the air. The desired object pose is given in the form of keypoints. We provide a function :func:`~trifinger_rl_datasets.utils.get_pose_from_keypoints` for converting keypoints to a position and a quaternion representing the orientation. .. image:: images/lift_sequence.png The Lift task is considerably harder than the Push task as it requires first flipping the cube to an approximately correct orientation before acquiring a stable grasp and lifting it to match the goal pose. Both tasks are available in a :doc:`simulated ` environment and on the :doc:`real robot `. The Cube ======== The object that is manipulated is a cube: .. image:: images/cube_v2.jpg It has a side width of 6.5 cm and weighs about 94 g. The surface of the cube has some structure to make grasping it easier. Each side has a different color to help the camera-based object tracking. .. _observation_and_action_space: Observation and Action Space ============================ The **observation space** can be chosen to be either a flat Box space or a nested Dict space. An observation in the form of a dictionary has the following structure: - robot_observation - position - velocity - torque - fingertip_force - fingertip_position - fingertip_velocity - robot_id - camera_observation - object_position - object_orientation - object_keypoints - delay - confidence - images (only if camera images are present in the dataset, see :doc:`/datasets/index`) - **action** -- *This contains the action from the previous step* - desired_goal - object_position :superscript:`1` - object_keypoints :superscript:`2` - achieved_goal - object_position :superscript:`1` - object_keypoints :superscript:`2` | :superscript:`1` Only for the Push Task | :superscript:`2` Only for the Lift Task ``robot_observation`` contains proprioceptive information about the robot, i.e., joint angles, angular velocities, torques at the joints, forces measured at the fingertips, the Cartesian coordinates of the fingertips and their velocities, and the robot ID. ``camera_observation`` contains the camera images (if they are present in the environment/dataset), and the estimate of the object pose obtained from the tracking system. ``object_orientation`` is a quaternion encoding the orientation of the object. ``object_keypoints`` refers to the Cartesian coordinates of the 8 corners of the cube. The item therefore encodes both position and orientation. ``delay`` refers to how much time passed between the recording of the camera images and when the images and the corresponding pose estimate was provided in the observation (in seconds). ``confidence`` contains the confidence of the pose estimate as a value between 0.0 and 1.0 (bigger is better). In the real environment, the robot_id contains the ID of the robot on which the policy was executed. In the simulated environment the robot_id is always set to 0. We furthermore simulate the delay of the object tracking present in the real system with values between 0.09s and 0.18s. However, the confidence has a fixed value of 1.0 in simulation. The ``desired_goal`` is sampled at the beginning of each episode and is fixed for the duration of the episode. The ``achieved_goal`` contains the goal that is currently achieved. The **action space** is 9 dimensional. The actions are the torques send to the actuators in the 3 joints of the 3 fingers. The torque range is [-0.397, 0.397] (corresponding to Nm). Rewards and Success =================== The **reward** is computed by applying a logistic kernel to an input :math:`x`: .. math:: k(x)=\frac{b+2}{\exp(a\lVert x\rVert) + b + \exp(-a\lVert x\rVert)} where :math:`\lVert \cdot \rVert` denotes the Euclidean norm. For different values of :math:`x`, the function has the following form (the shaded grey area corresponds to a cube centered at the goal and the green area corresponds to the goal achievement threshold (2 cm)): .. image:: images/reward_function.png :width: 100% In the pushing task, :math:`x` is the difference between the achieved and the desired goal position. In the lifting task, :math:`x` is the difference between the achieved and the desired keypoints and the reward is obtained by averaging over keypoints. The tolerance for goal achievement (i.e. success) is 2 cm for the position and 22 degrees for the orientation.