This paper could be a tweet

This is a list of (mostly ML) papers where the description of the method contains a lot of fluff, equation theatre, and it could be shortened significantly and explained much better.

This does not mean that the idea in the paper is bad or that the results of the mentioned papers are worthless. It just means that, in my opinion, they could be presented in a much better fashion.

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

There is no importance sampling! Nothing. Zilch! The proposed optimizer always updates embedding and lm-head and randomly selects transformer blocks. And they call this importance sampling, because the first and the last layer have a "higher importance"? At least the results look promising.

GhostNet: More Features from Cheap Operations

The idea is to replace (Pytorch pseudocode follow):

Conv2d(in, out, kernel_size)

With:

Sequential(
  Conv2d(in, small, kernel_size),
  Conv2d(small, out, kernel_size2, groups=small)
)

Aka factorized convolution in yet another way using smaller convolution + depthwise convolution.

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Instead of a bad figure and an important piece of algorithm hidden in the middle of the page:

We could have a much better figure (parts taken from Shufflenet):

With this, the paper could be understood in seconds instead of hours.

Locally Typical Sampling

30 pages of proofs, lingo, etc, could be simplified as:

I.e., sample words whose log probability is close to entropy.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This paper could be a tweet

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

GhostNet: More Features from Cheap Operations

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Locally Typical Sampling

About

Releases

Packages

usamec/this-paper-could-be-a-tweet

Folders and files

Latest commit

History

Repository files navigation

This paper could be a tweet

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

GhostNet: More Features from Cheap Operations

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Locally Typical Sampling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages