In this project, we will implement imitation learning/reinforcement learning to train an agent to drive in SuperTuxKart.
Please refer to the official documention regarding installation.
Papers of Choice
You may choose to implement one of the four algorithms:
- Behaviour Cloning: ALVINN: An Autonomous Land Vehicle in a Neural Network include the 1 neuron skip connection.
- DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- PPO: Proximal Policy Optimization Algorithms
- SAC: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
For the later two algorithms it is fine to use off-the-shelf implementations of the RL algorithm, and taylor it to supertux. Please implement Behavior cloning and DAgger from scratch.
Please explicitly specify which algorithm you choose implement, and upload the training code in your
.zip submission file.
Your agent will be evaluated on the
lighthouse track, and the performance is measured as distance down the track traveled under 2 minutes. If your agent finishes the track under 2 minutes, you will get full credit.
If you choose to implement RL (PPO or SAC), you can optionally use the sampling function we provide in
project/utils.py. It depends on the
ray library, which you can install with
pip install ray.
You can test your solution against the grader using
python -m val_grader project -v