title | author | source |
---|---|---|
Foudation model |
Bob |
All citations for this article can be fetched here https://www.youtube.com/watch?v=YzyBSDn3OQU |
Models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks1
Foudation models sometimes are referred as "Self-Supervised models", "Pre-trained models"1 Types of models
- Text Models
- Vision models
- Audio models
- Multimodal models
Organization | Models |
---|---|
OpenAI | GPT-3, Codex, DALL-E, CLIP |
Microsoft | Turing NLG |
Open-source | T5, RoBERTa |
LamDa, BERT, PaLM, MUM | |
Meta | OPT |
AI21labs | Jurassic |
HuggingFace + BigSciece | BLOOM |
Nvidia + Microsoft | MT-NLG |
Stability.ai | Stable diffusion |
BAAI | Wu Dao 2.0 |
EleutherAI | GPT-NeoX |
Google DeepMind | Gopher, Chinchilla |
Huawei | PanGu-Alpha |
Naver | HyperCL, OVA |
Metric | Definition |
---|---|
Model size and Parameters | Larger models perfom better, require more computing power |
Training data size | More comprehensive and diverse training data leads to better language understanding |
Training time | Faster training are better |
Inference Speed | Faster inference means faster responses generation |
Contextual understanding | How well the models understand context and provide coherent responses |
Generative Diversity | A model that can produce varied outputs while maintaining relevance is valuable |
Few-shot and Zero-shot learning | Test how well models can perform with limited examples (few-shot) or no |
Using:
- Transformer Trainer
- TensorFlow and Keras
- native PyTorch
Why? fine-tuning can decreasing cost of use
- Finetune LLM
- to make a model behave in certain way
- Knowledge base Embedding
- to make a model gain domain knowledge
- Question answering
- Sentiment classification
- Translation
- Coreference resolution
- Parsing
- Generate offensive content
- Generate untruthful content
- Enable disinformation/malicious
If used directly without further fine-tuning, foudation models can perfom:
- Linear probing
📓 Simple and efficient, model must be very good
- (Full)Fine-tuning
📓 Best method when we have lots of data, lots of memory
- Prefix-Tuning/Prompt-Tuning (consume less memory than Fine-tuning)
📓 Good for mid-sized datasets, memory-efficient
- Zero-shot Prompting
📓 Open ended task (no dataset collection), need to engineer prompts, accuracy can be low
- Few-shot Prompting
- In-context Learning
📓 Open ended task (minimal dataset collection), accuracy can be lower than tuning methods
- Chain-of-Thought
- Training model 1 GPT-LLM-Trainer, through Google Colab
- Prepare data: Dataset Generation: Using GPT-4, gpt-llm-trainer will generate a variety of prompts and responses based on the provied use-case
- System Message Generation: gpt-llm-trainer will generate an effective system prompt for your model
- Fine-tuning: After your dataset has been generated, the system will automatically split it into training and validation sets, fine-tune a model for you, and get it ready for inference
Terms | Definition |
---|---|
temperature | Entropy (?) high = creative, Low = precise |
number_of_examples | min ~100, the more the better for higher-quality model |
- Training model 2 autotrain, through Google Colab
- use-4bit (int4)
autotrain llm --train --project_name 'llama2-openassistant'
--model TinyPixel/Llama-2-7B-bf16-shared #using sharded model helps when you're low on VRAM. It will not load the model all at once
--data_path timdettmers/openassistant-guanaco #huggingface git repo id or local path all work with this. each model requires certain dataset format
--text_column text # name of the column
--use_peft #fine-tuning method. In this case, it's PEFT: Parameter-Efficient Fine-tuning methods (by huggingface)
--use_int4 #precision. In this case, it's 4bit precision
--learning_rate 2e-4 #the speed of conversion during training process. Lower value takes longer, but converge better
--train_batch_size 2 # raise this value for smaller dataset. for instance, 4. This depends on number of GPUs and amount of VRam
--num_train_epochs 3 # higher value for better quality
--trainer sft # trainer method. In this case, it's supervised fine-tunning. Works with input-output data format
--model_max_length 2048 # related to context-window (2k, 4k tokens...). llama2 models have 4096tokens context-windows. value of 2048 here will speed up training process
--pus_to_hub #push to huggingface hub
--repo_id Promptengineering/llama2-openassistant #if push to hub enabled, repo id needs to be filled
--block_size 2048 > training.log &
- after tokenizing process done, config.json will be created. The model is then be loaded using Transformer library
- source of explaination explain
Terms | Definition |
---|---|
auto-train | huggingface/autotrain-advanced library |
- Training model 3 tut
Footnotes
-
"Introducing the Center for Research on Foundation Models (CRFM)" Stanford HAI. Retrieved 11 June 2022. ↩ ↩2