Tasks
For the TriFinger RL datasets we considered two dexterous manipulation tasks:
Push a cube to a target location on the ground. The goal is given as the Cartesian coordinates of the desired position of the cube’s center.

Lift a cube to match a target pose (position and orientation) in the air. The desired object pose is given in the form of keypoints. We provide a function
get_pose_from_keypoints()
for converting keypoints to a position and a quaternion representing the orientation.

The Lift task is considerably harder than the Push task as it requires first flipping the cube to an approximately correct orientation before acquiring a stable grasp and lifting it to match the goal pose. Both tasks are available in a simulated environment and on the real robot.
The Cube
The object that is manipulated is a cube:

It has a side width of 6.5 cm and weighs about 94 g. The surface of the cube has some structure to make grasping it easier. Each side has a different color to help the camera-based object tracking.
Observation and Action Space
The observation space can be chosen to be either a flat Box space or a nested Dict space. An observation in the form of a dictionary has the following structure:
- robot_observation
position
velocity
torque
fingertip_force
fingertip_position
fingertip_velocity
robot_id
- camera_observation
object_position
object_orientation
object_keypoints
delay
confidence
images (only if camera images are present in the dataset, see Datasets)
action – This contains the action from the previous step
- desired_goal
object_position 1
object_keypoints 2
- achieved_goal
object_position 1
object_keypoints 2
robot_observation
contains proprioceptive information about the robot, i.e., joint angles,
angular velocities, torques at the joints, forces measured at the fingertips, the Cartesian
coordinates of the fingertips and their velocities, and the robot ID.
camera_observation
contains the camera images (if they are present in the environment/dataset),
and the estimate of the object pose obtained from the tracking system. object_orientation
is a
quaternion encoding the orientation of the object. object_keypoints
refers to the Cartesian
coordinates of the 8 corners of the cube. The item therefore encodes both position and orientation.
delay
refers to how much time passed between the recording of the camera images and when the
images and the corresponding pose estimate was provided in the observation (in seconds).
confidence
contains the confidence of the pose estimate as a value between 0.0 and 1.0 (bigger is
better).
In the real environment, the robot_id contains the ID of the robot on which the policy was executed. In the simulated environment the robot_id is always set to 0. We furthermore simulate the delay of the object tracking present in the real system with values between 0.09s and 0.18s. However, the confidence has a fixed value of 1.0 in simulation.
The desired_goal
is sampled at the beginning of each episode and is fixed for the
duration of the episode. The achieved_goal
contains the goal that is currently achieved.
The action space is 9 dimensional. The actions are the torques send to the actuators in the 3 joints of the 3 fingers. The torque range is [-0.397, 0.397] (corresponding to Nm).
Rewards and Success
The reward is computed by applying a logistic kernel to an input \(x\):
where \(\lVert \cdot \rVert\) denotes the Euclidean norm. For different values of \(x\), the function has the following form (the shaded grey area corresponds to a cube centered at the goal and the green area corresponds to the goal achievement threshold (2 cm)):

In the pushing task, \(x\) is the difference between the achieved and the desired goal position. In the lifting task, \(x\) is the difference between the achieved and the desired keypoints and the reward is obtained by averaging over keypoints.
The tolerance for goal achievement (i.e. success) is 2 cm for the position and 22 degrees for the orientation.