An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling


Title	An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Authors	Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Year	2018
URL	https://arxiv.org/abs/1803.01271

Since the Deep Learning revolution in NLP, sequence modelling tasks have typically been addressed with recurrent networks such as LSTMs. Bai et al. argue that this automatic reflex should be reconsidered, as convolutional architectures often outperform recurrent ones.

They investigate a so-called Temporal Convolutional Network (TCN): a deep convolutional network that

only looks to the past, i.e. the left context of each token,
uses dilated convolutions to grow the receptive field exponentially beyond the size of the filter, and
incorporates residual connections to avoid vanishing or exploding gradients in very deep networks.

The benefits of this architecture are its parallellism, flexible receptive field size, stable gradients, low memory requirement for training, and its ability to deal with variable-length inputs. Moreover, Bai et al.'s experiments show it tends to outperform traditional recurrent models across a wide range of tasks, including NLP tasks such as word-level and character-level language modelling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling.md

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling.md

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Files

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling.md

Latest commit

History

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling.md

File metadata and controls

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling