CS 395T - Deep learning seminar - Fall 2019

meets MW 2:00 - 3:30pm in GDC 4.304

instructor Philipp Krähenbühl
email philkr (at) utexas.edu
office hours by appointment (send email)

TA Dian Chen
email dchen (at) cs.utexas.edu
TA hours M 4:00-5:00pm, TA station desk 2

Please use canvas for assignment questions.

Prerequisites

391L - Intro Machine learning (or equivalent)
311 or 311H - Discrete math for computer science (or equivalent)
proficiency in Python, high level C++ understanding
- All projects are in Python with PyTorch as the recommended deep learning backend. It is also recommended to familiarize yourself with numpy, scipy, scikit-learn and matplotlib as additional libraries.
- NOTE: It is possible to use other languages / deep learning packages, but highly discouraged as the course staff cannot provide support.
Basic deep learning background
- You should be familiar with at least one deep learning package (Caffe, Tensorflow, Torch, Matconvnet, …). You should have trained at least one deep network with one of these packages.

Class overview

The class reads and discuss two recent research papers per class
Coding assignments/projects implement one of 4-6 themantically similar papers (we read in class).
The best two groups for each implementation presents their findings in class (10-20 min each)
For each assignment, you’ll have to work in groups of 2-3 students. No individual submissions. Different team-mates for each project.
Auditing allowed if there is space (no homework or presentation, but participation required)

This year the class focuses on two major themes: “Image generation”, and “vision and action”. This is meant to be a very interactive class for upper level students (MS or PhD). For every class we read two recent research papers (most no older than two years), which we will discuss in class.

Goals of the class

After this class you should be able to

Read and understand deep learning papers
Implement and execute a research project in deep learning

Grading

5% attandance (may miss 2 classes)
5% participation
15% per project
(extra) 2% Best performing paper implementation each
(extra) 1% Second best performing paper implementation each

Schedule

Date		What
Wed	Aug 28	Course introduction
Mon	Sep 02	No class - Labor day
Wed	Sep 04	ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky etal. 2012 Deep Residual Learning for Image Recognition, He etal. 2016
Mon	Sep 09	ImageNet: A Large-Scale Hierarchical Image Database, Deng etal. 2009 Microsoft COCO: Common Objects in Context, Lin etal. 2014
Wed	Sep 11	Learning Deep Features for Scene Recognition using Places Database, Zhou etal. 2014 LVIS: A Dataset for Large Vocabulary Instance Segmentation, Gupta etal. 2019
Fri	Sep 13	Project 1 due - 11:59pm
Mon	Sep 16	Implementation discussion - best of Project 1
Wed	Sep 18	Generative Adversarial Nets, Goodfellow etal. 2014 Implicit Maximum Likelihood Estimation, Li etal. 2018
Mon	Sep 23	Perceptual Losses for Real-Time Style Transfer and Super-Resolution, Johnson etal. 2016 Glow: Generative Flow with Invertible 1x1 Convolutions, Kingma etal. 2018
Wed	Sep 25	Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Zhu etal. 2017 Large Scale GAN Training for High Fidelity Natural Image Synthesis, Brock etal. 2018
Fri	Sep 27	Project 2 due - 11:59pm
Mon	Sep 30	Implementation discussion - best of Project 2
Wed	Oct 02	Full Resolution Image Compression with Recurrent Neural Networks, Toderici etal 2016 Real-Time Adaptive Image Compression, Rippel and Bourdev 2017
Mon	Oct 07	Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations, Agustsson etal. 2017 End-to-end Optimized Image Compression , Ballé etal. 2016
Wed	Oct 09	Learned Video Compression, Rippel etal. 2018 DVC: An End-To-End Deep Video Compression Framework, Lu etal. 2019
Fri	Oct 11	Project 3 due - 11:59pm
Mon	Oct 14	Implementation discussion - best of Project 3
Wed	Oct 16	Fully Convolutional Networks for Semantic Segmentation, Long etal 2015 Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Chen etal. 2018
Mon	Oct 21	Deep Layer Aggregation, Yu etal. 2017 U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger etal. 2015
Wed	Oct 23	Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren etal. 2015 Mask R-CNN, He etal. 2017
Mon	Oct 28	Stacked Hourglass Networks for Human Pose Estimation, Newell etal. 2016 Simple Baselines for Human Pose Estimation and Tracking, Xiao etal. 2018
Wed	Oct 30	Proximal Policy Optimization Algorithms, Schulman etal. 2017 Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja etal. 2018
Fri	Nov 01	Project 4 due - 11:59pm
Mon	Nov 04	Implementation discussion - best of Project 4
Wed	Nov 06	ALVINN: An Autonomous Land Vehicle in a Neural Network, Pomerleau etal. 1989 A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, Ross etal. 2011
Mon	Nov 11	Human-level control through deep reinforcement learning, Mhin etal. 2015 Mastering the game of Go without human knowledge
Wed	Nov 13	Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, Carreira etal. 2017 Non-local Neural Networks, Wang etal. 2018
Fri	Nov 15	Project 5 due - 11:59pm
Mon	Nov 18	Implementation discussion - best of Project 5
Wed	Nov 20	Habitat: A Platform for Embodied AI Research, Savva etal. 2019 Learning by Cheating, 2019
Mon	Nov 25	Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, 2019 Momentum Contrast for Unsupervised Visual Representation Learning, He et al. 2019
Wed	Nov 27	No class - Thanksgiving
Mon	Dec 02	Research presentations I
Wed	Dec 04	Research presentations II
Fri	Dec 06	Project 6 due - 11:59pm
Mon	Dec 09	Implementation discussion - best of Project 6

Projects

There are 6 mini-projects on class. Project 1 implements an auto-encoder and should be a warmup. Projects 2-5 implement specific papers. You’ll have a choice of four papers to implement per project. The best two implementations per project receive extra credit. It is ok to fail at implementing a project, but clearly highlight and document why you failed on a writeup. If you manage to re-implement the project, your writeup can just be a few sentences (e.g. it worked as advertised).

Project 6 is a bit more open-ended and uses a 4 very recent simulation environment.

Expected workload

Estimates of required effort to pass the class are:

2-4 hours per week reading
3 hours per week discussions in class
2-10 hours per week of programming

General tips

Start the projects early
- most deep neural networks take 1 day to train on a GPU
- let us know early if you don’t have GPU access (first or second week), colab or google cloud might be options
read the assigned papers early, write down questions and discussion topics

Tips for reading a paper

Do more than just read the paper
- No paper is trivial
- Think of what are the essential tricks that make the paper work
Question any decision and claim made by the authors
- It is the authors responsibility to convince you that their approach works better than prior (or simpler) alternatives
- If a claim is not backed by experiments or a citation (or backed by a wrong citation), you may assume it’s wrong
Think about how this fits with other peoples findings
- Is there a larger theme across a series of papers?
- Does it contradict other papers you know?
Use colored markers
- Mark important things in one color
- Mark things you disagree with (you think are wrong) in a different color

Notes

Syllabus subject to change.