Tasks

For the TriFinger RL datasets we considered two dexterous manipulation tasks:

  1. Push a cube to a target location on the ground. The goal is given as the Cartesian coordinates of the desired position of the cube’s center.

_images/push_sequence.png
  1. Lift a cube to match a target pose (position and orientation) in the air. The desired object pose is given in the form of keypoints. We provide a function get_pose_from_keypoints() for converting keypoints to a position and a quaternion representing the orientation.

_images/lift_sequence.png

The Lift task is considerably harder than the Push task as it requires first flipping the cube to an approximately correct orientation before acquiring a stable grasp and lifting it to match the goal pose. Both tasks are available in a simulated environment and on the real robot.

The Cube

The object that is manipulated is a cube:

_images/cube_v2.jpg

It has a side width of 6.5 cm and weighs about 94 g. The surface of the cube has some structure to make grasping it easier. Each side has a different color to help the camera-based object tracking.

Observation and Action Space

The observation space can be chosen to be either a flat Box space or a nested Dict space. An observation in the form of a dictionary has the following structure:

  • robot_observation
    • position

    • velocity

    • torque

    • fingertip_force

    • fingertip_position

    • fingertip_velocity

    • robot_id

  • camera_observation
    • object_position

    • object_orientation

    • object_keypoints

    • delay

    • confidence

    • images (only if camera images are present in the dataset, see Datasets)

  • actionThis contains the action from the previous step

  • desired_goal
    • object_position 1

    • object_keypoints 2

  • achieved_goal
    • object_position 1

    • object_keypoints 2

1 Only for the Push Task
2 Only for the Lift Task

robot_observation contains proprioceptive information about the robot, i.e., joint angles, angular velocities, torques at the joints, forces measured at the fingertips, the Cartesian coordinates of the fingertips and their velocities, and the robot ID.

camera_observation contains the camera images (if they are present in the environment/dataset), and the estimate of the object pose obtained from the tracking system. object_orientation is a quaternion encoding the orientation of the object. object_keypoints refers to the Cartesian coordinates of the 8 corners of the cube. The item therefore encodes both position and orientation. delay refers to how much time passed between the recording of the camera images and when the images and the corresponding pose estimate was provided in the observation (in seconds). confidence contains the confidence of the pose estimate as a value between 0.0 and 1.0 (bigger is better).

In the real environment, the robot_id contains the ID of the robot on which the policy was executed. In the simulated environment the robot_id is always set to 0. We furthermore simulate the delay of the object tracking present in the real system with values between 0.09s and 0.18s. However, the confidence has a fixed value of 1.0 in simulation.

The desired_goal is sampled at the beginning of each episode and is fixed for the duration of the episode. The achieved_goal contains the goal that is currently achieved.

The action space is 9 dimensional. The actions are the torques send to the actuators in the 3 joints of the 3 fingers. The torque range is [-0.397, 0.397] (corresponding to Nm).

Rewards and Success

The reward is computed by applying a logistic kernel to an input \(x\):

\[k(x)=\frac{b+2}{\exp(a\lVert x\rVert) + b + \exp(-a\lVert x\rVert)}\]

where \(\lVert \cdot \rVert\) denotes the Euclidean norm. For different values of \(x\), the function has the following form (the shaded grey area corresponds to a cube centered at the goal and the green area corresponds to the goal achievement threshold (2 cm)):

_images/reward_function.png

In the pushing task, \(x\) is the difference between the achieved and the desired goal position. In the lifting task, \(x\) is the difference between the achieved and the desired keypoints and the reward is obtained by averaging over keypoints.

The tolerance for goal achievement (i.e. success) is 2 cm for the position and 22 degrees for the orientation.