Evaluation Procedure¶
We provide a script to evaluate your model with multiple random trajectories and to compute the average reward over all runs.
Please execute this before submitting your code as you need to report your reward along with the submission. We will also evaluate submissions on our side with an identical script (only with a different set of goals), so by testing on your side, you make sure that your code will also run for us.
evaluate_policy.py¶
For the evaluation, you have to provide a file evaluate_policy.py
at the
root of your package, which takes a trajectory as input and should execute your
policy in the simulator, writing the resulting actions and observations to a
file.
If you are using the rrc_example_package as base for your own package, you
already have template for the evaluate_policy.py
. Otherwise, simply copy
that file from the example package to your own one.
In evaluate_policy.py
replace the RandomPolicy
with your own policy.
You may also replace our gym environment if you are using a custom one. Apart
from this, do not change anything in that file to ensure it stays compatible
with our evaluation pipeline. See the comments in the file for more
information.
Run the Evaluation¶
The rrc_example_package contains a script rrc_evaluate_prestage.py
(found
in the scripts/
directory) which runs your evaluate_policy.py
on a
number of random trajectories.
To execute it, open a terminal and execute the following command (adjust the paths accordingly):
python3 scripts/rrc_evaluate_prestage.py \
--singularity-image rrc2021.sif \
--package path/to/your/package \
--output-dir path/to/output_dir
It will first run evaluate_policy.py
for a number of trajectoriesand store
the action/observation logs in the specified output directory. Then it replays
all these log files to verify the result and to compute the reward.
About the arguments:
--singularity-image
specifies the Singularity image in which the evaluation
is run. Replace rrc2021.sif
with your custom image, in case you extended
the base image.
--package
specifies the path to your package. When calling from the root
directory of your package, you can simply set it to .
.
--output-dir
specifies the directory to which the results as well as some
temporary files are written. After execution the result files needed for the
submission can be found in a subdirectory called “output”. Important:
Already existing files inside this directory may be deleted or overwritten
during execution.
About the Replay¶
As mentioned above, the actual reward for comparing different submissions is not
computed by evaluate_policy.py
but in the replay step.
The simulation is initialized to the initial state as in evaluate_policy.py
.
Then the actions from the log generated by evaluate_policy.py
are applied on
the robot one by one. Since the simulator is deterministic, this will result in
the exact same trajectories as in the original run. The actual reward used for
comparison to other participants is computed during this replay.
The replay serves two purposes:
Ensure that the correct reward function is used for the evaluation (participants may use a different reward function in their environment for training).
Prevent cheating. In their evaluation script, participants could in theory access the simulation and modify the state to their favour (e.g. simply reset the cube state to the goal pose). On our side, we will do the replay in a separate environment without any code from participants.
To ensure that the replay really results in the same trajectories, make sure to only use the provided interface to control the robot and do not modify the state of the simulation in any other way.