Sakura-SOLAR-DPO

(주)미디어그룹사람과숲과 (주)마커의 LLM 연구 컨소시엄에서 개발된 모델입니다

Sakura-SOLAR project;
I noted that all most of things about the Sakura-SOLAR models which is the global LLM Rank 1 on December, 2023.
I hope, the open-source more and more develop!😄😄

(Quick) Model lists

Introduction

I created the 🌸kyujinpy/Sakura-SOLAR-Instruct LLM, which is Open LLM Rank 1.
I love open-source, I want to share everything about the model that won first rank.
I hope this GitHub helps a lot of people.😎😎

News

2023.12.28
- Rank1 (Open LLM leaderboard): 🌸kyujinpy/Sakura-SOLAR-Instruct

Model Performance

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
🌸kyujinpy/Sakura-SOLAR-Instruct	74.40	70.99	88.42	66.33	71.79	83.66	65.20
🌸🐋kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2	74.17	71.25	88.52	66.13	72.16	83.03	63.91
🌸kyujinpy/Sakura-SOLAR-Instruct-DPO-v2	74.14	70.90	88.41	66.48	71.86	83.43	63.76
🌸🐋kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1	74.13	71.25	88.48	66.21	72.12	82.87	63.84
🌸🐋kyujinpy/Sakura-SOLRCA-Instruct-DPO	74.05	71.16	88.49	66.17	72.10	82.95	63.46
SOLAR-10.7B-Instruct-v1.0	74.20	71.08	88.16	66.21	71.43	83.58	64.75
Mixtral-8x7B-Instruct-v0.1	72.62	70.22	87.63	71.16	64.58	81.37	60.73

Follow up as link.

Training code

1. Merge

First, donwload mergekit.
Implement below command for merge.

# Example)
mergekit-yaml ./config.yml ./Sakura-SOLAR [--cuda]

2. DPO

Implement below code for dpo.

# Example)
python DPO.py \
    --base_model kyujinpy/Sakura-SOLAR-Instruct \
    --data-path  kyujinpy/orca_math_dpo \
    --output_dir [...output_dir...] \
    --num_epochs [...epoch...] \
    --batch_size [...batch_size...] \
    --micro_batch_size [...micro_batch...] \
    --learning_rate [...learning_rate...] \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules [...target_modules...] \
    --lr_scheduler 'linear' \
    --warmup_ratio 0.1 \
    --cutoff_len 4096 \

Merge: model + LoRA layer

python merge.py \
    --base_model_name_or_path kyujinpy/Sakura-SOLAR-Instruct \
    --peft_model_path [...output_dir...] \
    --output_dir [...output_final_dir...]

Hyperparameters & Prompt

😎kyujinpy/Sakura-SOLAR-Instruct

slices:
  - sources:
      - model: VAGOsolutions/SauerkrautLM-SOLAR-Instruct
        layer_range: [0, 48]
      - model: upstage/SOLAR-10.7B-Instruct-v1.0
        layer_range: [0, 48]
        
merge_method: slerp
base_model: upstage/SOLAR-10.7B-Instruct-v1.0

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union
    
dtype: float16

😎kyujinpy/Sakura-SOLAR-Instruct-DPO-v1

Hyperparameter	kyujinpy/Sakura-SOLAR-Instruct-DPO-v1
LoRA method	LoRA
load_in_8bit	True
learning rate	1e-6
batch size	32
micro batch size	2
warmup ratio	0.1
epochs	1
weight decay	0.
lr scheduler	linear
lora alpha	16
lora rank	16
lora dropout	0.05
beta	0.1
optim	adamw_torch
bf16	True
lora target modules	`embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`
cutoff length	4096
Datasets	argilla/distilabel-math-preference-dpo
Base Model	kyujinpy/Sakura-SOLAR-Instruct

### User:

### Assistant:

Prompting

😎kyujinpy/Sakura-SOLAR-Instruct-DPO-v2

Hyperparameter	kyujinpy/Sakura-SOLAR-Instruct-DPO-v2
LoRA method	LoRA
load_in_8bit	True
learning rate	1e-5
batch size	32
micro batch size	2
warmup ratio	0.1
epochs	1
weight decay	0.
lr scheduler	linear
lora alpha	16
lora rank	16
lora dropout	0.05
beta	0.1
optim	paged_adamw_32bit
bf16	True
lora target modules	`embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`
cutoff length	4096
Datasets	argilla/distilabel-math-preference-dpo
Base Model	kyujinpy/Sakura-SOLAR-Instruct

### User:

### Assistant:

😎kyujinpy/Sakura-SOLRCA-Instruct-Dpo

Hyperparameter	kyujinpy/Sakura-SOLRCA-Instruct-Dpo
LoRA method	LoRA
load_in_8bit	True
learning rate	5e-7
batch size	32
micro batch size	1
warmup ratio	0.1
epochs	1
weight decay	0.
lr scheduler	linear
lora alpha	16
lora rank	16
lora dropout	0.05
beta	0.1
optim	paged_adamw_32bit
bf16	True
lora target modules	`embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`
cutoff length	4096
Datasets	Intel/orca_dpo_pairs
Base Model	kyujinpy/Sakura-SOLAR-Instruct

### User:

### Assistant:

😎kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v1

Hyperparameter	kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v1
LoRA method	LoRA
load_in_8bit	True
learning rate	5e-7
batch size	32
micro batch size	2
warmup ratio	0.1
epochs	1
weight decay	0.
lr scheduler	linear
lora alpha	16
lora rank	16
lora dropout	0.05
beta	0.1
optim	paged_adamw_32bit
bf16	True
lora target modules	`embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`
cutoff length	4096
Datasets	kyujinpy/orca_math_dpo
Base Model	kyujinpy/Sakura-SOLAR-Instruct

### User:

### Assistant:

😎kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v2

Hyperparameter	kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v2
LoRA method	LoRA
load_in_8bit	True
learning rate	5e-7
batch size	32
micro batch size	2
warmup ratio	0.1
epochs	1
weight decay	0.
lr scheduler	linear
lora alpha	16
lora rank	16
lora dropout	0.05
beta	0.1
optim	paged_adamw_32bit
bf16	True
lora target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`
cutoff length	4096
Datasets	kyujinpy/orca_math_dpo
Base Model	kyujinpy/Sakura-SOLAR-Instruct

### User:

### Assistant:

Prompting

TODO

Share code
Share hyperparameters
Share datasets

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
DPO.py		DPO.py
README.md		README.md
merge.py		merge.py
sakura.png		sakura.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sakura-SOLAR-DPO

Contents

(Quick) Model lists

Introduction

News

Model Performance

Training code

1. Merge

2. DPO

Hyperparameters & Prompt

TODO

References

About

Releases

Packages

Languages

KyujinHan/Sakura-SOLAR-DPO

Folders and files

Latest commit

History

Repository files navigation

Sakura-SOLAR-DPO

Contents

(Quick) Model lists

Introduction

News

Model Performance

Training code

1. Merge

2. DPO

Hyperparameters & Prompt

TODO

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages