******************** Evaluation Procedure ******************** We provide a script to evaluate your model with multiple random trajectories and to compute the average reward over all runs. Please execute this before submitting your code as you need to report your reward along with the submission. We will also evaluate submissions on our side with an identical script (only with a different set of goals), so by testing on your side, you make sure that your code will also run for us. evaluate_policy.py ================== For the evaluation, you have to provide a file ``evaluate_policy.py`` at the root of your package, which takes a trajectory as input and should execute your policy in the simulator, writing the resulting actions and observations to a file. If you are using the rrc_example_package_ as base for your own package, you already have template for the ``evaluate_policy.py``. Otherwise, simply copy that file from the example package to your own one. In ``evaluate_policy.py`` replace the ``RandomPolicy`` with your own policy. You may also replace our gym environment if you are using a custom one. Apart from this, do not change anything in that file to ensure it stays compatible with our evaluation pipeline. See the comments in the file for more information. Run the Evaluation ================== The rrc_example_package_ contains a script ``rrc_evaluate_prestage.py`` (found in the ``scripts/`` directory) which runs your ``evaluate_policy.py`` on a number of random trajectories. To execute it, open a terminal and execute the following command (adjust the paths accordingly):: python3 scripts/rrc_evaluate_prestage.py \ --singularity-image rrc2021.sif \ --package path/to/your/package \ --output-dir path/to/output_dir It will first run ``evaluate_policy.py`` for a number of trajectoriesand store the action/observation logs in the specified output directory. Then it replays all these log files to verify the result and to compute the reward. About the arguments: ``--singularity-image`` specifies the Singularity image in which the evaluation is run. Replace ``rrc2021.sif`` with your custom image, in case you extended the base image. ``--package`` specifies the path to your package. When calling from the root directory of your package, you can simply set it to ``.``. ``--output-dir`` specifies the directory to which the results as well as some temporary files are written. After execution the result files needed for the submission can be found in a subdirectory called "output". **Important:** Already existing files inside this directory may be deleted or overwritten during execution. About the Replay ---------------- As mentioned above, the actual reward for comparing different submissions is not computed by ``evaluate_policy.py`` but in the replay step. The simulation is initialized to the initial state as in ``evaluate_policy.py``. Then the actions from the log generated by ``evaluate_policy.py`` are applied on the robot one by one. Since the simulator is deterministic, this will result in the exact same trajectories as in the original run. The actual reward used for comparison to other participants is computed during this replay. The replay serves two purposes: - Ensure that the correct reward function is used for the evaluation (participants may use a different reward function in their environment for training). - Prevent cheating. In their evaluation script, participants could in theory access the simulation and modify the state to their favour (e.g. simply reset the cube state to the goal pose). On our side, we will do the replay in a separate environment without any code from participants. To ensure that the replay really results in the same trajectories, make sure to only use the provided interface to control the robot and do not modify the state of the simulation in any other way.