End-To-End Memory Networks, Sukhbaatar, Szlam, Weston, Fergus; 2015 - Summary
author: DartingMelody
score: 7 / 10

What is the core idea?

The paper introduces a novel recurrent neural network with an explicit large memory and a recurrent attention mechanism. The model reads recurrently from memory multiple times before giving the results. This model is trained end-to-end and therefore requires comparitively less supervision during training. Hence, it can also be applied to realistic settings. The paper also shows experimentally that multiple hops over the long-term memory leads to better performance of the model.

How is it realized (technically)?

The Model takes as input \(x_{1},...,x_{i}\) (to store in memory), query q and outputs answer a. For the question answering task, each \(x_{i}\) is a sentence. The model writes all x to the memory up to a fixed buffer size and creates continuous representation of x and q.

Question-Answering problem example

Single layer:

Multiple layer:

Model with one layer and multiple layer

Constraints on embedding vectors

Adjacent:

Layer-wise (RNN-like):

Training details

How well does the paper perform?

What interesting variants are explored?

The model (operating on a word level) was applied on a language modeling task with the following changes:

Though performing decently well when compared to baselines, the end to end network memory model still lags behind memory networks trained with strong supervision.

TL;DR