********************
Evaluation Procedure
********************

We provide a script to evaluate your model with multiple random trajectories and
to compute the average reward over all runs.

Please execute this before submitting your code as you need to report your
reward along with the submission.  We will also evaluate submissions on our side
with an identical script (only with a different set of goals), so by testing on
your side, you make sure that your code will also run for us.


evaluate_policy.py
==================

For the evaluation, you have to provide a file ``evaluate_policy.py`` at the
root of your package, which takes a trajectory as input and should execute your
policy in the simulator, writing the resulting actions and observations to a
file.

If you are using the rrc_example_package_ as base for your own package, you
already have template for the ``evaluate_policy.py``.  Otherwise, simply copy
that file from the example package to your own one.

In ``evaluate_policy.py`` replace the ``RandomPolicy`` with your own policy.
You may also replace our gym environment if you are using a custom one.  Apart
from this, do not change anything in that file to ensure it stays compatible
with our evaluation pipeline.  See the comments in the file for more
information.


Run the Evaluation
==================

The rrc_example_package_ contains a script ``rrc_evaluate_prestage.py`` (found
in the ``scripts/`` directory) which runs your ``evaluate_policy.py`` on a
number of random trajectories.

To execute it, open a terminal and execute the following command (adjust the
paths accordingly)::

    python3 scripts/rrc_evaluate_prestage.py \
        --singularity-image rrc2021.sif \
        --package path/to/your/package \
        --output-dir path/to/output_dir


It will first run ``evaluate_policy.py`` for a number of trajectoriesand store
the action/observation logs in the specified output directory.  Then it replays
all these log files to verify the result and to compute the reward.

About the arguments:

``--singularity-image`` specifies the Singularity image in which the evaluation
is run.  Replace ``rrc2021.sif`` with your custom image, in case you extended
the base image.

``--package`` specifies the path to your package.  When calling from the root
directory of your package, you can simply set it to ``.``.

``--output-dir`` specifies the directory to which the results as well as some
temporary files are written.  After execution the result files needed for the
submission can be found in a subdirectory called "output".  **Important:**
Already existing files inside this directory may be deleted or overwritten
during execution.


About the Replay
----------------

As mentioned above, the actual reward for comparing different submissions is not
computed by ``evaluate_policy.py`` but in the replay step.

The simulation is initialized to the initial state as in ``evaluate_policy.py``.
Then the actions from the log generated by ``evaluate_policy.py`` are applied on
the robot one by one.  Since the simulator is deterministic, this will result in
the exact same trajectories as in the original run.  The actual reward used for
comparison to other participants is computed during this replay.

The replay serves two purposes:

- Ensure that the correct reward function is used for the evaluation
  (participants may use a different reward function in their environment for
  training).
- Prevent cheating.  In their evaluation script, participants could in theory
  access the simulation and modify the state to their favour (e.g. simply reset
  the cube state to the goal pose).  On our side, we will do the replay in a
  separate environment without any code from participants.

To ensure that the replay really results in the same trajectories, make sure to
only use the provided interface to control the robot and do not modify the state
of the simulation in any other way.