The "tl;dr" on a few notable papers on Transformers and modern NLP.
This is a living repo to keep tabs on different research threads.
Last Updated: September 20th, 2021.
Models: GPT- *, * BERT *, Adapter- *, * T5, Megatron, DALL-E, Codex, etc.
Topics: Transformer architectures + training; adversarial attacks; scaling laws; alignment; memorization; few labels; causality.
Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface 🤗 implementation.
Here are some examples ---> t5, byt5, deduping transformer training sets.
This repo also includes a table quantifying the differences across transformer papers all in one table.
The transformers papers are presented somewhat chronologically below. Go to the ":point_right: Notes :point_left:" column below to find the notes for each paper.
- Quick Note
- Motivation
- Papers::Transformer Papers
- Papers::1 Table To Rule Them All
- Papers::Adversarial Attack Papers
- Papers::Fine-tuning Papers
- Papers::Alignment Papers
- Papers::Causality Papers
- Papers::Scaling Law Papers
- Papers::LM Memorization Papers
- Papers::Limited Label Learning Papers
- How To Contribute
- How To Point Our Errors
- Citation
- License
This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.
With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.
All of the table summaries found ^ collapsed into one really big table here.
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Gradient-based Adversarial Attacks against Text Transformers | 2021 | Gradient-based attack notes | None |
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning | 2021 | SCL notes | None |
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Fine-Tuning Language Models from Human Preferences | 2019 | OpenAI | Human pref notes | None |
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Scaling Laws for Neural Language Models | 2020 | OpenAI | Scaling laws notes | None |
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Extracting Training Data from Large Language Models | 2021 | Google et al. | To-Do | None |
Deduplicating Training Data Makes Language Models Better | 2021 | Google et al. | Dedup notes | None |
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP | 2021 | GIT/UNC | To-Do | None |
Learning with fewer labeled examples | 2021 | Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) | Worth a read, won't summarize here. | None |
If you are interested in contributing to this repo, feel free to do the following:
- Fork the repo.
- Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
- Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
- Submit your PR.
Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.
@misc{cliff-notes-transformers,
author = {Thompson, Will},
url = {https://github.com/will-thompson-k/cliff-notes-transformers},
year = {2021}
}
For the notes above, I've linked the original papers.
MIT