Attention
This challenge has ended!
This documentation is only for the Real Robot Challenge 2020 which has ended. Following challenges have their own documentation, see the challenge website for more information.
Evaluation Procedure¶
We provide a script to evaluate your model with multiple random goals for each difficulty level and to compute the weighted accumulated reward over all runs.
Please execute this before submitting your code as you need to report your reward along with the submission. We will also evaluate submissions on our side with an identical script (only with a different set of goals), so by testing on your side, you make sure that your code will also run for us.
evaluate_policy.py¶
For the evaluation, you will edit the file called evaluate_policy.py
which
takes an initial state and goal as input and should execute your policy in the
simulator, writing the resulting actions to a file.
You can find the file in the rrc_simulation repository at:
scripts/evaluate_policy.py
Replace the RandomPolicy
with your own policy. You may also replace our gym
environment if you are using a custom one. Apart from this, do not change
anything in that file to ensure it stays compatible with our evaluation
pipeline. See the comments in the file for more information.
Run the Evaluation¶
Open a terminal in the directory rrc_simulation/scripts
(the one containing
evaluate_policy.py
), make sure the rrc_simulation conda environment is
activated and run the rrc_evaluate
command:
cd rrc_simulation/scripts
# if not done already, activate conda
source ../conda_activate_rrc_simulation.sh
# execute evaluation (the specified output directory needs to exist)
rrc_evaluate path/to/output_dir
It will first run evaluate_policy.py
for a number of goals of each
difficulty level and store the action logs in the specified output directory.
Then it replays all these log files to verify the result and to compute the
reward.
About the Replay¶
As mentioned above, the actual reward for comparing different submissions is not
computed by evaluate_policy.py
but in the replay step.
The simulation is initialized to the same object pose as was passed to
evaluate_policy.py
. Then the actions from the log generated by
evaluate_policy.py
are applied on the robot one by one. Since the simulator
is deterministic, this will result in the exact same trajectories as in the
original run. The actual reward used for comparison to other participants is
computed during this replay.
The replay serves two purposes:
Ensure that the correct reward function is used for the evaluation (participants may modify the reward in their environment for training).
Prevent cheating. In their evaluation script, participants could in theory access the simulation and modify the state to their favour (e.g. simply reset the cube state to the goal pose). On our side, we will do the replay in a separate environment without any code from participants.
To ensure that the replay really results in the same trajectories, make sure to only use the provided interface to control the robot and do not modify the state of the simulation in any other way.