SimTriFingerCubeEnv

class trifinger_rl_datasets.SimTriFingerCubeEnv(episode_length=15, difficulty=4, keypoint_obs=True, obs_action_delay=0, reward_type='dense', visualization=False, real_time=True, image_obs=False, camera_config_robot=1)[source]

Gym environment for simulated manipulation of a cube with a TriFingerPro platform.

Parameters:
  • episode_length (int) – How often step will run before done is True.

  • keypoint_obs (bool) – Whether to give keypoint observations for pose in addition to position and quaternion.

  • obs_action_delay (int) – Delay between arrival of an observation and application of the action computed from this observation in milliseconds.

  • reward_type (str) – Which reward to use. Can be ‘dense’ or ‘sparse’.

  • visualization (bool) – If true, the PyBullet GUI is run for visualization.

  • real_time (bool) – If true, the environment is stepped in real time instead of as fast as possible (ignored if visualization is disabled).

  • image_obs (bool) – If true, the camera images are returned as part of the observation.

  • camera_config_robot (int) – ID of the robot to retrieve camera configs from. Only used if image_obs is True.

compute_reward(achieved_goal, desired_goal, info)[source]

Compute the reward for the given achieved and desired goal.

Parameters:
  • achieved_goal (dict) – Current pose of the object.

  • desired_goal (dict) – Goal pose of the object.

  • info (dict) – An info dictionary containing a field “time_index” which contains the time index of the achieved_goal.

Returns:

The reward that corresponds to the provided achieved goal w.r.t. to the desired goal.

Return type:

float

has_achieved(achieved_goal, desired_goal)[source]

Determine whether goal pose is achieved.

Parameters:
  • achieved_goal (dict) –

  • desired_goal (dict) –

Return type:

bool

render(mode='human')[source]

Does nothing. See SimTriFingerCubeEnv for how to enable visualization.

Parameters:

mode (str) –

reset(preappend_actions=True)[source]

Reset the environment.

Parameters:

preappend_actions (bool) –

reset_cube()[source]

Replay a recorded trajectory to move cube to center of arena.

reset_fingers(reset_wait_time=3000)[source]

Reset fingers to initial position.

This resets neither the frontend nor the cube. This method is supposed to be used for ‘soft resets’ between episodes in one job.

Parameters:

reset_wait_time (int) –

sample_new_goal(goal=None)[source]

Sample a new desired goal.

step(action, preappend_actions=True)[source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Parameters:
  • action (ndarray) – An action provided by the agent

  • preappend_actions (bool) – Whether to already append actions that will be executed during obs-action delay to action queue.

Returns:

  • observation (dict): agent’s observation of the current environment.

  • reward (float): amount of reward returned after previous action.

  • terminated (bool): whether the MDP has reached a terminal state. If true, the user needs to call reset().

  • truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied. For this environment this corresponds to a timeout. If true, the user needs to call reset().

  • info (dict): info dictionary containing the current time index.

Return type:

tuple