Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans, Sutskever; 2018 - Summary
author: chengchunhsu
score: 7 / 10

What is the core idea?

Problem:

Challenge:

Solution: introduced Generative Pre-Training (GPT) Model, a semi-supervised strategy for enhancing the performance on multiple NLP tasks.

How is it realized (technically)?

Model architecture: Transformer

Step #1: Unsupervised pre-training

Step #2: Supervised fine-tuning

Problem: some task does not have structured input

Solution: concatenation/fusion

5

How well does the paper perform?

Impact of number of layers transferred

Added layers provide further benefits

→ each layer in the pre-trained model contains useful functionality for solving target task.

Zero-shot Behaviors

Even without fine-tuning, the generative model gains improvement along with # iteration of updates.

→ the pretraining supports the learning of a wide range of task-relevant functionality

6

Ablation studies

7

What interesting variants are explored?

TL;DR