The CubeTrajectoryEnv Gym Environment¶
The example package contains an example Gym environment
SimCubeTrajectoryEnv
for the
trajectory task in the simulation pre-stage. You may use this in your code but
you are also free to modify it in any way to fit your needs. Of course, you can
also ignore it completely and use the
TriFingerPlatform
class directly.
-
class
rrc_example_package.cube_trajectory_env.
SimCubeTrajectoryEnv
(goal_trajectory=None, action_type=<ActionType.POSITION: 2>, step_size=1, visualization=False)[source]¶ Gym environment for moving cubes with simulated TriFingerPro.
-
__init__
(goal_trajectory=None, action_type=<ActionType.POSITION: 2>, step_size=1, visualization=False)[source]¶ Initialize.
- Parameters
goal_trajectory (Optional[Sequence[Tuple[int, Sequence[float]]]]) – Goal trajectory for the cube. If
None
a new random trajectory is sampled upon reset.action_type (ActionType) – Specify which type of actions to use. See
ActionType
for details.step_size (int) – Number of actual control steps to be performed in one call of step().
visualization (bool) – If true, the pyBullet GUI is run for visualization.
-
compute_reward
(achieved_goal, desired_goal, info)¶ Compute the reward for the given achieved and desired goal.
- Parameters
achieved_goal (Sequence[float]) – Current position of the object.
desired_goal (Sequence[float]) – Goal position of the current trajectory step.
info (dict) – An info dictionary containing a field “time_index” which contains the time index of the achieved_goal.
- Returns
The reward that corresponds to the provided achieved goal w.r.t. to the desired goal. Note that the following should always hold true:
ob, reward, done, info = env.step() assert reward == env.compute_reward( ob['achieved_goal'], ob['desired_goal'], info, )
- Return type
float
-
seed
(seed=None)¶ Sets the seed for this env’s random number generator.
Note
Spaces need to be seeded separately. E.g. if you want to sample actions directly from the action space using
env.action_space.sample()
you can set a seed there usingenv.action_space.seed()
.- Returns
List of seeds used by this environment. This environment only uses a single seed, so the list contains only one element.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics.
When end of episode is reached, you are responsible for calling
reset()
to reset this environment’s state.- Parameters
action – An action provided by the agent (depends on the selected
ActionType
).- Returns
observation (dict): agent’s observation of the current environment.
reward (float): amount of reward returned after previous action.
done (bool): whether the episode has ended, in which case further step() calls will return undefined results.
info (dict): info dictionary containing the current time index.
- Return type
tuple
-
-
class
rrc_example_package.cube_trajectory_env.
ActionType
(value)[source]¶ Different action types that can be used to control the robot.
-
TORQUE
¶ Use pure torque commands. The action is a list of torques (one per joint) in this case.
-
POSITION
¶ Use joint position commands. The action is a list of angular joint positions (one per joint) in this case. Internally a PD controller is executed for each action to determine the torques that are applied to the robot.
-
TORQUE_AND_POSITION
¶ Use both torque and position commands. In this case the action is a dictionary with keys “torque” and “position” which contain the corresponding lists of values (see above). The torques resulting from the position controller are added to the torques in the action before applying them to the robot.
-