The CubeTrajectoryEnv Gym Environment

The example package contains an example Gym environment SimCubeTrajectoryEnv for the trajectory task in the simulation pre-stage. You may use this in your code but you are also free to modify it in any way to fit your needs. Of course, you can also ignore it completely and use the TriFingerPlatform class directly.

class rrc_example_package.cube_trajectory_env.SimCubeTrajectoryEnv(goal_trajectory=None, action_type=<ActionType.POSITION: 2>, step_size=1, visualization=False)[source]

Gym environment for moving cubes with simulated TriFingerPro.

__init__(goal_trajectory=None, action_type=<ActionType.POSITION: 2>, step_size=1, visualization=False)[source]

Initialize.

Parameters
  • goal_trajectory (Optional[Sequence[Tuple[int, Sequence[float]]]]) – Goal trajectory for the cube. If None a new random trajectory is sampled upon reset.

  • action_type (ActionType) – Specify which type of actions to use. See ActionType for details.

  • step_size (int) – Number of actual control steps to be performed in one call of step().

  • visualization (bool) – If true, the pyBullet GUI is run for visualization.

compute_reward(achieved_goal, desired_goal, info)

Compute the reward for the given achieved and desired goal.

Parameters
  • achieved_goal (Sequence[float]) – Current position of the object.

  • desired_goal (Sequence[float]) – Goal position of the current trajectory step.

  • info (dict) – An info dictionary containing a field “time_index” which contains the time index of the achieved_goal.

Returns

The reward that corresponds to the provided achieved goal w.r.t. to the desired goal. Note that the following should always hold true:

ob, reward, done, info = env.step()
assert reward == env.compute_reward(
    ob['achieved_goal'],
    ob['desired_goal'],
    info,
)

Return type

float

reset()[source]

Reset the environment.

seed(seed=None)

Sets the seed for this env’s random number generator.

Note

Spaces need to be seeded separately. E.g. if you want to sample actions directly from the action space using env.action_space.sample() you can set a seed there using env.action_space.seed().

Returns

List of seeds used by this environment. This environment only uses a single seed, so the list contains only one element.

step(action)[source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Parameters

action – An action provided by the agent (depends on the selected ActionType).

Returns

  • observation (dict): agent’s observation of the current environment.

  • reward (float): amount of reward returned after previous action.

  • done (bool): whether the episode has ended, in which case further step() calls will return undefined results.

  • info (dict): info dictionary containing the current time index.

Return type

tuple

class rrc_example_package.cube_trajectory_env.ActionType(value)[source]

Different action types that can be used to control the robot.

TORQUE

Use pure torque commands. The action is a list of torques (one per joint) in this case.

POSITION

Use joint position commands. The action is a list of angular joint positions (one per joint) in this case. Internally a PD controller is executed for each action to determine the torques that are applied to the robot.

TORQUE_AND_POSITION

Use both torque and position commands. In this case the action is a dictionary with keys “torque” and “position” which contain the corresponding lists of values (see above). The torques resulting from the position controller are added to the torques in the action before applying them to the robot.