Homework 10

In this homework we will use imitation learning to train an agent to play SuperTux. You’ll use a new tux dataset that contains human players’ trajectories and design a model to predict what actions to take given the current observation.

starter code data

Note: The data is about 5GB once decompressed. It will likely not fit on lab machines. We’re working on a solution.

Action prediction

You will design a network that is similar to the earlier assignment to predict what action (keyboard input) to execute in a SuperTux game. The input to your network is a sequence of observations, where observations are 64x64 RGB images. You have to predict what action to take given the observations where actions are 6d binary vector of key states (0: up, 1:down) as in previous homework. It is recommended that you use ideas in the last homework to utilize temporal dependencies between observations.

In this homework we will measure your model’s performance both against the expert datasets in offline setting, as well as how it performs in the actual game.

Once you have specified the network, train it

Input example

Observation Image

input

Output example

Logits of prediction actions:

-5.1 -1 0.6 0.2 -0.1 0.1

which is equivalent to the key states: 0 0 1 1 0 1

Setting up Supertux and Dataset

The action_img_trainval.tar.gz file contains two folders train and val. Extract them in the same directory which contains the homework and the grader folders.
This homework requires you to setup Pytux for performing the online evalution by playing the actual game. Instructions to set up Supertux can be found here.
Once you have either downloaded the binary or compiled the Supertux source, create the symlinks for pytux and data folders using the following commands
```
cd path/to/homework_10
ln -s path/to/pytux pytux
ln -s path/to/data data
```
Make sure the folder structure looks like this:
- homework_10
- grader
- homework
- train
- val
- pytux
- dataset

Getting Started

We provide you with starter code that loads the dataset from a training and validation set. We also provide an optional tensorboard interface.

Define your model in models.py and modify the training code in train.py.
Train your model.
```
 python3 -m homework.train
```
Optionally, you can use tensorboard to visualize your training loss and accuracy.
```
 python3 -m homework.train -l myRun
```
and in another terminal tensorboard --logdir myRun, where myRun is the log directory. Pro-tip: You can run tensorboard on the parent directory of many logs to visualize them all.
Test your model by measuring the log-likelihood
```
 python3 -m homework.test
```
Test your policy performance in a real Tux game
```
 python3 -m homework.play
```
To evaluate your code against grader, execute:
```
 python3 -m grader homework
```
Note that the grader can take a long time because it contains two parts - offline and online evaluation. Make sure your model performs before running the grader. You can use test.py to measure the offline performance and use play.py to measure online performance.
Create the submission file
```
 python3 -m homework.bundle
```

Grading

The grading will be depend on the log-likelihood scores of your model as well as how well the trained policy actually plays the Supertux game. The grading schema is as follows:

Linear grading of Log-likelihood scores between 0.5 and 0.1: 50 points.
Grading based on position reached by tux on 4 levels of Supertux
- For level 01 - Welcome to Antarctica.stl, position range 0.1-0.24 will be graded linearly for 10 points.
- For level 02 - The Journey Begins.stl, position range 0.03-0.18 will be graded linearly for 10 points.
- For level 03 - Via Nostalgica.stl, position range 0.01-0.16 will be graded linearly for 10 points.
- For level 04 - Tobgle Road.stl, position range 0.04-0.14 will be graded linearly for 10 points.
- For level 05 - The Somewhat Smaller Bath.stl, position range 0.05-0.1 will be graded linearly for 10 points.

Note

You may find the default loss functions makes the network too pessismitic about pressing key strokes. To solve the class imbalance problem, you can try reweighting the positive and negatives classes by their frequencies, a technique we used in Homework 7

Important Tips

You can still do the training remotely but the grader and the test modules won’t run over ssh as pytux does not have the support for playing supertux over ssh. Thus, you need to use either your own machines or the lab machines for running these modules. The provided binary and the source works best on Ubuntu systems. You can try compiling the source for Mac OS but it definitely won’t work for Windows. The binary might not work due to different versions of dependencies installed on your system, hence compiling from source following the instructions here.

For compiling Supertux on Ubuntu, use the following command to install all the dependencies required for building it.

sudo apt-get install build-essential cmake libcurl4-openssl-dev libglew-dev libsdl2-image-dev libsdl2-dev libboost-all-dev

Contact the TAs if you face any issues setting up Supertux on your system and advisably, set this up early to avoid any late-minute problems.

Relevant operations

Conv Layers
Recurrent Layers
operations of prior assignments