https://arxiv.org/abs/2405.09673

Abstract

LoRA is a method used to fine tune large language models by training an adaptor that approximates the original weights of the matrix. By using LoRA, we're able to significantly reduce the memory footprint of the fine tune job.

This paper studies the extent that LoRA is able to match up to a full fine tune for continued pre-training and instruction fine-tuning.

Continued Pre-Training: Text given is unlabelled, model just performs next token predictions
Instruction Tuning : Text given is domain specific and contains a specific way of responding to user requests

Notably, they find that LoRA requires more epochs to achieve the same performance as a full fine tune but is able to retain more of its original performance on a source dataset.

Datasets Used

Coding CPT - Starcoder Python - This consists of permissively licensed repositories from Github in 80+ Programming Languages.
Math CPT - OpenWebMath : This dataset includes mathematical web pages from Common Crawl
Coding IFT - Magicoder-Evol-Instruct-110K : This contains 72.97 tokens of programming questions and answers with the LLM being iteratively prompted to increase the difficulty of a set of question-answer pairs
Math IFT - MetaMathQA : This contains roughly 103M tokens and uses the training sets of GSM8K and MATH to generate additional synthetic examples using GPT3.5

Measurement

Improvement in Target Domain -> HumanEval and GSM8K for seeing how the model has performed Original Source Metric -> HellaSwag, WinoGrande and ARC-Challenge to test model's reasoning ability

Results

Lora underperforms a full fine tune

We can see that for the same number of tokens, LoRa achieves a significantly lower performance in both it's HumanEval and GSM8K score.

Interestingly on Math, LoRa first outperforms GSM8K before being beat around 0.27 billion tokens?

When comparing LoRA vs a full fine-tune, it seems like LoRA is able to retain more of it's general reasoning ability ( given its better performance in the chosen datasets )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA Learns Less and Forgets Less.md

LoRA Learns Less and Forgets Less.md

Abstract

Datasets Used

Measurement

Results

Lora underperforms a full fine tune

Files

LoRA Learns Less and Forgets Less.md

Latest commit

History

LoRA Learns Less and Forgets Less.md

File metadata and controls

Abstract

Datasets Used

Measurement

Results

Lora underperforms a full fine tune