Homework 11

In this homework, we will use gradient free reinforcement learning to improve the agent we trained in homework 10. You will use your homework 10 model and policy specifics and fine-tune the imitation agent by gradient free method.

start code

This homework is very open-ended. You can do anything you want short of hand-coding a policy. The only requirement is that the policy is learned.

Possible methods:

• Random search
• Hill climbing
• Augmented Random Search (SPSA)
• Cross Entropy Method
• Any other evolutionary algorithm

Input example

Observation Image

Output example

Logits of prediction actions:

-5.1 -1 0.6 0.2 -0.1 0.1


Getting Started

We provide you with starter code that loads the dataset from a training and validation set. We also provide an optional tensorboard interface.

1. Define your model in models.py and modify the training code in train.py.
2. Train your model.
 python3 -m homework.train

3. Test your model by measuring the performance
 python3 -m homework.test

4. To evaluate your code against grader, execute:
 python3 -m grader homework


Note that the grader can take a long time because it contains two parts and will train your agents for the first grading part. Make sure your training code is working before running the grader.

5. Create the submission file
 python3 -m homework.bundle


Parallel data collection

We provide you with a parallel data collection interface in policy_eval.py. To use the interface to collect data in parallel,

evaluators = [PolicyEvaluator.remote(level, iterations) for _ in range(n_workers)]

rewards = ray.get([
evaluator.eval.remote(m, H) for m, evaluator in zip(models, evaluators)
])


Installing Ray

To use the parallel data collection, you need to install the ray library

pip3 install ray


Hint: Parallelize N/2 evaluators at the same time, where N is the number of CPU cores on your machine.

Setting up Supertux

1. This homework requires you to setup Pytux for performing the online evalution by playing the actual game. Instructions to set up Supertux can be found here.
2. Once you have either downloaded the binary or compiled the Supertux source, create the symlinks for pytux and data folders using the following commands
cd path/to/homework_11
ln -s path/to/pytux pytux
ln -s path/to/data data

3. Make sure the folder structure looks like this:
• homework_11
• homework
• pytux
• data

Pro-tip: Fine-tune from Imitation Learning agent

To speed up the training for this assignment, you can fine-tune your architecture from homework 10.