Attention

This challenge has ended!

This documentation is only for the Real Robot Challenge 2020 which has ended. Following challenges have their own documentation, see the challenge website for more information.

The CubeEnv Gym Environment

The CubeEnv environment is registered as

rrc_simulation.gym_wrapper:real_robot_challenge_phase_1-v1

and can be created using gym.make() using that name.

class rrc_simulation.gym_wrapper.envs.cube_env.CubeEnv(initializer, action_type=<ActionType.POSITION: 2>, frameskip=1, visualization=False)[source]

Gym environment for moving cubes with simulated TriFingerPro.

__init__(initializer, action_type=<ActionType.POSITION: 2>, frameskip=1, visualization=False)[source]

Initialize.

Parameters
  • initializer – Initializer class for providing initial cube pose and goal pose. See RandomInitializer and FixedInitializer.

  • action_type (ActionType) – Specify which type of actions to use. See ActionType for details.

  • frameskip (int) – Number of actual control steps to be performed in one call of step().

  • visualization (bool) – If true, the pyBullet GUI is run for visualization.

compute_reward(achieved_goal, desired_goal, info)[source]

Compute the reward for the given achieved and desired goal.

Parameters
  • achieved_goal (dict) – Current pose of the object.

  • desired_goal (dict) – Goal pose of the object.

  • info (dict) – An info dictionary containing a field “difficulty” which specifies the difficulty level.

Returns

The reward that corresponds to the provided achieved goal w.r.t. to the desired goal. Note that the following should always hold true:

ob, reward, done, info = env.step()
assert reward == env.compute_reward(
    ob['achieved_goal'],
    ob['desired_goal'],
    info,
)

Return type

float

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

seed(seed=None)[source]

Sets the seed for this env’s random number generator.

Note

Spaces need to be seeded separately. E.g. if you want to sample actions directly from the action space using env.action_space.sample() you can set a seed there using env.action_space.seed().

Returns

List of seeds used by this environment. This environment only uses a single seed, so the list contains only one element.

step(action)[source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Parameters

action – An action provided by the agent (depends on the selected ActionType).

Returns

  • observation (dict): agent’s observation of the current environment.

  • reward (float) : amount of reward returned after previous action.

  • done (bool): whether the episode has ended, in which case further step() calls will return undefined results.

  • info (dict): info dictionary containing the difficulty level of the goal.

Return type

tuple

class rrc_simulation.gym_wrapper.envs.cube_env.ActionType(value)[source]

Different action types that can be used to control the robot.

TORQUE

Use pure torque commands. The action is a list of torques (one per joint) in this case.

POSITION

Use joint position commands. The action is a list of angular joint positions (one per joint) in this case. Internally a PD controller is executed for each action to determine the torques that are applied to the robot.

TORQUE_AND_POSITION

Use both torque and position commands. In this case the action is a dictionary with keys “torque” and “position” which contain the corresponding lists of values (see above). The torques resulting from the position controller are added to the torques in the action before applying them to the robot.

class rrc_simulation.gym_wrapper.envs.cube_env.RandomInitializer(difficulty)[source]

Initializer that samples random initial states and goals.

__init__(difficulty)[source]

Initialize.

Parameters

difficulty (int) – Difficulty level for sampling goals.

get_goal()[source]

Get a random goal depending on the difficulty.

get_initial_state()[source]

Get a random initial object pose (always on the ground).

class rrc_simulation.gym_wrapper.envs.cube_env.FixedInitializer(difficulty, initial_state, goal)[source]

Initializer that uses fixed values for initial pose and goal.

__init__(difficulty, initial_state, goal)[source]

Initialize.

Parameters
  • difficulty (int) – Difficulty level of the goal. This is still needed even for a fixed goal, as it is also used for computing the reward (the cost function is different for the different levels).

  • initial_state (move_cube.Pose) – Initial pose of the object.

  • goal (move_cube.Pose) – Goal pose of the object.

Raises
  • Exception – If initial_state or goal are not valid. See

  • move_cube.validate_goal

get_goal()[source]

Get the goal that was set in the constructor.

get_initial_state()[source]

Get the initial state that was set in the constructor.