In this homework we will use imitation learning to train an agent to play SuperTux. You’ll use a new tux dataset that contains human players’ trajectories and design a model to predict what actions to take given the current observation.
Note: The data is about 5GB once decompressed. It will likely not fit on lab machines. We’re working on a solution.
You will design a network that is similar to the earlier assignment to predict what action (keyboard input) to execute in a SuperTux game. The input to your network is a sequence of observations, where observations are 64x64 RGB images. You have to predict what action to take given the observations where actions are 6d binary vector of key states (0: up, 1:down) as in previous homework. It is recommended that you use ideas in the last homework to utilize temporal dependencies between observations.
In this homework we will measure your model’s performance both against the expert datasets in offline setting, as well as how it performs in the actual game.
Once you have specified the network, train it
Logits of prediction actions:
-5.1 -1 0.6 0.2 -0.1 0.1
which is equivalent to the key states:
0 0 1 1 0 1
Setting up Supertux and Dataset
action_img_trainval.tar.gzfile contains two folders
val. Extract them in the same directory which contains the
- This homework requires you to setup Pytux for performing the online evalution by playing the actual game. Instructions to set up Supertux can be found here.
- Once you have either downloaded the binary or compiled the Supertux source, create the symlinks for
datafolders using the following commands
cd path/to/homework_10 ln -s path/to/pytux pytux ln -s path/to/data data
- Make sure the folder structure looks like this:
We provide you with starter code that loads the dataset from a training and validation set. We also provide an optional tensorboard interface.
- Define your model in
models.pyand modify the training code in
- Train your model.
python3 -m homework.train
- Optionally, you can use tensorboard to visualize your training loss and accuracy.
python3 -m homework.train -l myRun
and in another terminal
tensorboard --logdir myRun, where
myRunis the log directory. Pro-tip: You can run tensorboard on the parent directory of many logs to visualize them all.
- Test your model by measuring the log-likelihood
python3 -m homework.test
- Test your policy performance in a real Tux game
python3 -m homework.play
- To evaluate your code against grader, execute:
python3 -m grader homework
Note that the grader can take a long time because it contains two parts - offline and online evaluation. Make sure your model performs before running the grader. You can use
test.pyto measure the offline performance and use
play.pyto measure online performance.
- Create the submission file
python3 -m homework.bundle
The grading will be depend on the log-likelihood scores of your model as well as how well the trained policy actually plays the Supertux game. The grading schema is as follows:
- Linear grading of Log-likelihood scores between 0.5 and 0.1: 50 points.
- Grading based on position reached by tux on 4 levels of Supertux
- For level 01 - Welcome to Antarctica.stl, position range 0.1-0.24 will be graded linearly for 10 points.
- For level 02 - The Journey Begins.stl, position range 0.03-0.18 will be graded linearly for 10 points.
- For level 03 - Via Nostalgica.stl, position range 0.01-0.16 will be graded linearly for 10 points.
- For level 04 - Tobgle Road.stl, position range 0.04-0.14 will be graded linearly for 10 points.
- For level 05 - The Somewhat Smaller Bath.stl, position range 0.05-0.1 will be graded linearly for 10 points.
You may find the default loss functions makes the network too pessismitic about pressing key strokes. To solve the class imbalance problem, you can try reweighting the positive and negatives classes by their frequencies, a technique we used in Homework 7
You can still do the training remotely but the grader and the test modules won’t run over ssh as pytux does not have the support for playing supertux over ssh. Thus, you need to use either your own machines or the lab machines for running these modules. The provided binary and the source works best on Ubuntu systems. You can try compiling the source for Mac OS but it definitely won’t work for Windows. The binary might not work due to different versions of dependencies installed on your system, hence compiling from source following the instructions here.
For compiling Supertux on Ubuntu, use the following command to install all the dependencies required for building it.
sudo apt-get install build-essential cmake libcurl4-openssl-dev libglew-dev libsdl2-image-dev libsdl2-dev libboost-all-dev
Contact the TAs if you face any issues setting up Supertux on your system and advisably, set this up early to avoid any late-minute problems.