Attention

This challenge has ended!

This documentation is only for the Real Robot Challenge 2020 which has ended. Following challenges have their own documentation, see the challenge website for more information.

Evaluation Procedure

We provide a script to evaluate your model with multiple random goals for each difficulty level and to compute the weighted accumulated reward over all runs.

Please execute this before submitting your code as you need to report your reward along with the submission. We will also evaluate submissions on our side with an identical script (only with a different set of goals), so by testing on your side, you make sure that your code will also run for us.

evaluate_policy.py

For the evaluation, you will edit the file called evaluate_policy.py which takes an initial state and goal as input and should execute your policy in the simulator, writing the resulting actions to a file.

You can find the file in the rrc_simulation repository at:

scripts/evaluate_policy.py

Replace the RandomPolicy with your own policy. You may also replace our gym environment if you are using a custom one. Apart from this, do not change anything in that file to ensure it stays compatible with our evaluation pipeline. See the comments in the file for more information.

Run the Evaluation

Open a terminal in the directory rrc_simulation/scripts (the one containing evaluate_policy.py), make sure the rrc_simulation conda environment is activated and run the rrc_evaluate command:

cd rrc_simulation/scripts

# if not done already, activate conda
source ../conda_activate_rrc_simulation.sh

# execute evaluation (the specified output directory needs to exist)
rrc_evaluate path/to/output_dir

It will first run evaluate_policy.py for a number of goals of each difficulty level and store the action logs in the specified output directory. Then it replays all these log files to verify the result and to compute the reward.

About the Replay

As mentioned above, the actual reward for comparing different submissions is not computed by evaluate_policy.py but in the replay step.

The simulation is initialized to the same object pose as was passed to evaluate_policy.py. Then the actions from the log generated by evaluate_policy.py are applied on the robot one by one. Since the simulator is deterministic, this will result in the exact same trajectories as in the original run. The actual reward used for comparison to other participants is computed during this replay.

The replay serves two purposes:

  • Ensure that the correct reward function is used for the evaluation (participants may modify the reward in their environment for training).

  • Prevent cheating. In their evaluation script, participants could in theory access the simulation and modify the state to their favour (e.g. simply reset the cube state to the goal pose). On our side, we will do the replay in a separate environment without any code from participants.

To ensure that the replay really results in the same trajectories, make sure to only use the provided interface to control the robot and do not modify the state of the simulation in any other way.