SimTriFingerCubeEnv
- class trifinger_rl_datasets.SimTriFingerCubeEnv(episode_length=15, difficulty=4, keypoint_obs=True, obs_action_delay=0, reward_type='dense', visualization=False, real_time=True, image_obs=False, camera_config_robot=1)[source]
Gym environment for simulated manipulation of a cube with a TriFingerPro platform.
- Parameters:
episode_length (int) – How often step will run before done is True.
keypoint_obs (bool) – Whether to give keypoint observations for pose in addition to position and quaternion.
obs_action_delay (int) – Delay between arrival of an observation and application of the action computed from this observation in milliseconds.
reward_type (str) – Which reward to use. Can be ‘dense’ or ‘sparse’.
visualization (bool) – If true, the PyBullet GUI is run for visualization.
real_time (bool) – If true, the environment is stepped in real time instead of as fast as possible (ignored if visualization is disabled).
image_obs (bool) – If true, the camera images are returned as part of the observation.
camera_config_robot (int) – ID of the robot to retrieve camera configs from. Only used if image_obs is True.
- compute_reward(achieved_goal, desired_goal, info)[source]
Compute the reward for the given achieved and desired goal.
- Parameters:
achieved_goal (dict) – Current pose of the object.
desired_goal (dict) – Goal pose of the object.
info (dict) – An info dictionary containing a field “time_index” which contains the time index of the achieved_goal.
- Returns:
The reward that corresponds to the provided achieved goal w.r.t. to the desired goal.
- Return type:
float
- has_achieved(achieved_goal, desired_goal)[source]
Determine whether goal pose is achieved.
- Parameters:
achieved_goal (dict) –
desired_goal (dict) –
- Return type:
bool
- render(mode='human')[source]
Does nothing. See
SimTriFingerCubeEnv
for how to enable visualization.- Parameters:
mode (str) –
- reset(preappend_actions=True)[source]
Reset the environment.
- Parameters:
preappend_actions (bool) –
- reset_fingers(reset_wait_time=3000)[source]
Reset fingers to initial position.
This resets neither the frontend nor the cube. This method is supposed to be used for ‘soft resets’ between episodes in one job.
- Parameters:
reset_wait_time (int) –
- step(action, preappend_actions=True)[source]
Run one timestep of the environment’s dynamics.
When end of episode is reached, you are responsible for calling
reset()
to reset this environment’s state.- Parameters:
action (ndarray) – An action provided by the agent
preappend_actions (bool) – Whether to already append actions that will be executed during obs-action delay to action queue.
- Returns:
observation (dict): agent’s observation of the current environment.
reward (float): amount of reward returned after previous action.
terminated (bool): whether the MDP has reached a terminal state. If true, the user needs to call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied. For this environment this corresponds to a timeout. If true, the user needs to call reset().
info (dict): info dictionary containing the current time index.
- Return type:
tuple