You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to fine-tune the model on my specific dataset, however I came into a problem. The zero-width non-joiner (u200C) and zero-width joiner (u200D) characters are not present after fine-tuning using the LoRA script provided here:
Also, I've noticed that for sentence_TGT, src_lang and tgt_lang arguments are reversed. Is this the case? After that line, the characters are not present in the string anymore.
The text was updated successfully, but these errors were encountered:
I am trying to fine-tune the model on my specific dataset, however I came into a problem. The zero-width non-joiner (u200C) and zero-width joiner (u200D) characters are not present after fine-tuning using the LoRA script provided here:
https://github.com/AI4Bharat/IndicTrans2/blob/8de6eca588cfcd7648464084199c4881c41f58ab/huggingface_interface/train_lora.py
In particular, the lines bellow in
load_and_process_translation_dataset
remove these characters:Also, I've noticed that for sentence_TGT, src_lang and tgt_lang arguments are reversed. Is this the case? After that line, the characters are not present in the string anymore.
The text was updated successfully, but these errors were encountered: