You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After using this code
`#method2
def format_function(example):
# Format 'chosen' text
messages_chosen = [
{"role": "user", "content": str(example["chosen"])} # convert list to string
]
formatted_chosen = tokenizer.apply_chat_template(
messages_chosen,
tokenize=False,
add_generation_prompt=False
)
Is this right way to prepare data before finetuning?
I mean as convert our data this way
`
Conversation with template: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>
`
should i need to prepare my data this way i mean i need the 'text' column?
Please tell me how can i prepare my data for fintuing DPO?
Please help me out.
Dataset({
features: ['chosen', 'rejected', 'prompt'],
num_rows: 62135
})
For example: {'prompt': 'Is the milk produced by a hippopotamus pink in color?', 'chosen': 'No, the milk produced by a hippopotamus is not pink. It is ' 'typically white or beige in color. The misconception arises due to ' 'the hipposudoric acid, a red pigment found in hippo skin ' 'secretions, which people mistakenly assume affects the color of ' 'their milk.', 'rejected': 'No, hippopotamus milk is not pink in color. It is actually white ' 'or grayish-white.'}
After using this code
`#method2
def format_function(example):
# Format 'chosen' text
messages_chosen = [
{"role": "user", "content": str(example["chosen"])} # convert list to string
]
formatted_chosen = tokenizer.apply_chat_template(
messages_chosen,
tokenize=False,
add_generation_prompt=False
)
i get this output
data:image/s3,"s3://crabby-images/b33da/b33dad759fa46f12873c8ae3f8702719321dbd60" alt="Image"
Is this right way to prepare data before finetuning?
I mean as convert our data this way
`
Conversation with template: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>
`
should i need to prepare my data this way i mean i need the 'text' column?
Please tell me how can i prepare my data for fintuing DPO?
Please help me out.
Manas.
This is the collab notbook link https://colab.research.google.com/github/huggingface/smol-course/blob/main/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb
https://github.com/pratim808/smol-course/blob/main/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb
The text was updated successfully, but these errors were encountered: