GitHub - KatTiel/PEFT_DistilBERT_emotion_classification

Fine-tuning LLM DistilBERT for 😄😍 Emotion Classification 😔😡 using Parameter Efficient Fine-Tuning (PEFT)

The project's aim was to fine-tune the LLM DistilBERT to specialize on categorizing emotions in texts into five categories:

0: sadness, 1: joy, 2: love, 3: anger, 4: fear, 5: surprise

This was achieved by using Supervised Learning and a PEFT method, namely Low Rank Adaptation (LoRA). The advantage of this training approach is that it drastically reduces the storage requirements and computation costs as it adjusts only a small number of additional parameters while leaving the majority of the LLM parameters unchanged from their initial state. (1) In particular, LoRA integrates a compact trainable submodule into the transformer architecture, while maintaining the pre-trained model weights, and integrating trainable rank decomposition matrices within every layer. (2)

Prerequisites

Python 3.11
Jupyter Notebook pip install notebook
Required dependencies pip install -r requirements.txt
'emotion' dataset from Hugging Face (3)
GPU is recommended, e.g. using Google Colab or Kaggle Notebooks

Data Set & Preprocessing

The data set was split into a training set (80%, 16000 records), a validation set (10%, 2000 records) and a test set (10%, 2000 records).

Furthermore, the dataset was tokenized, so that words were represented as numbers and therefore could be fed into the computations.

Pre-trained LLM DistilBERT

DistilBERT is a smaller and lighter version of BERT with reduced computational costs. It underwent pretraining through a self-supervised approach on the same dataset, leveraging the BERT base model as a guidance. This method entails training solely on raw texts, without human annotations, thereby enabling the utilization of vast amounts of publicly accessible data. Inputs and labels are generated automatically from the texts via the BERT base model. (4)

Performance Measurement

Accuracy

Accuracy is a good general performance parameter when all classes are equally important.

❗This model reached an evaluation accuracy of 0.89

License

MIT

References

(1) Xu et al. Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment. arXiv:2312.12148 cs.CL https://doi.org/10.48550/arXiv.2312.12148

(2) Hu et al., LoRA: Low-rank adaptation of large language models., in Proc. Int. Conf. Learn. Representations, 2022.

(3) DAIR.AI. emotion, Retrieved 3/2024 from Hugging Face

(4) Sanh et al., DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108. cs.CL. https://doi.org/10.48550/arXiv.1910.01108

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
0_emotion_data		0_emotion_data
1_PEFT_DistilBERT		1_PEFT_DistilBERT
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning LLM DistilBERT for 😄😍 Emotion Classification 😔😡 using Parameter Efficient Fine-Tuning (PEFT)

Prerequisites

Data Set & Preprocessing

Pre-trained LLM DistilBERT

Performance Measurement

Accuracy

License

References

About

Releases

Packages

Languages

KatTiel/PEFT_DistilBERT_emotion_classification

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning LLM DistilBERT for 😄😍 Emotion Classification 😔😡 using Parameter Efficient Fine-Tuning (PEFT)

Prerequisites

Data Set & Preprocessing

Pre-trained LLM DistilBERT

Performance Measurement

Accuracy

License

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages