Skip to content

Latest commit

 

History

History
22 lines (14 loc) · 650 Bytes

descript.md

File metadata and controls

22 lines (14 loc) · 650 Bytes

Self Learned Explanations for Transformer Language Models

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

How to make it faster:

  • No beam search

  • Better work stack building

  • Modify generation lengths

  • We tok encode and decode for kind of no reason

  • Multiple levels at once

  • [Not urgent] If we want to do multi-gpu, the code for pred_fn of self learned scratchpads needs to be fixed.

  • [Is something stupid] Fix the double thing maybe. Why woudl the model EVER predict ? It should never learn to do that.

FUTURE:

  • Find a new Dataset (some DeepChem thing maybe)
  • use pre-trained BART