Evaluation Procedure¶

We provide a script to evaluate your model with multiple random trajectories and to compute the average reward over all runs.

Please execute this before submitting your code as you need to report your reward along with the submission. We will also evaluate submissions on our side with an identical script (only with a different set of goals), so by testing on your side, you make sure that your code will also run for us.

evaluate_policy.py¶

For the evaluation, you have to provide a file evaluate_policy.py at the root of your package, which takes a trajectory as input and should execute your policy in the simulator, writing the resulting actions and observations to a file.

If you are using the rrc_example_package as base for your own package, you already have template for the evaluate_policy.py. Otherwise, simply copy that file from the example package to your own one.

In evaluate_policy.py replace the RandomPolicy with your own policy. You may also replace our gym environment if you are using a custom one. Apart from this, do not change anything in that file to ensure it stays compatible with our evaluation pipeline. See the comments in the file for more information.

Run the Evaluation¶

The rrc_example_package contains a script rrc_evaluate_prestage.py (found in the scripts/ directory) which runs your evaluate_policy.py on a number of random trajectories.

To execute it, open a terminal and execute the following command (adjust the paths accordingly):

python3 scripts/rrc_evaluate_prestage.py \
    --singularity-image rrc2021.sif \
    --package path/to/your/package \
    --output-dir path/to/output_dir

It will first run evaluate_policy.py for a number of trajectoriesand store the action/observation logs in the specified output directory. Then it replays all these log files to verify the result and to compute the reward.

About the arguments:

--singularity-image specifies the Singularity image in which the evaluation is run. Replace rrc2021.sif with your custom image, in case you extended the base image.

--package specifies the path to your package. When calling from the root directory of your package, you can simply set it to ..

--output-dir specifies the directory to which the results as well as some temporary files are written. After execution the result files needed for the submission can be found in a subdirectory called “output”. Important: Already existing files inside this directory may be deleted or overwritten during execution.

About the Replay¶

As mentioned above, the actual reward for comparing different submissions is not computed by evaluate_policy.py but in the replay step.

The simulation is initialized to the initial state as in evaluate_policy.py. Then the actions from the log generated by evaluate_policy.py are applied on the robot one by one. Since the simulator is deterministic, this will result in the exact same trajectories as in the original run. The actual reward used for comparison to other participants is computed during this replay.

The replay serves two purposes:

Ensure that the correct reward function is used for the evaluation (participants may use a different reward function in their environment for training).
Prevent cheating. In their evaluation script, participants could in theory access the simulation and modify the state to their favour (e.g. simply reset the cube state to the goal pose). On our side, we will do the replay in a separate environment without any code from participants.

To ensure that the replay really results in the same trajectories, make sure to only use the provided interface to control the robot and do not modify the state of the simulation in any other way.