This is the cross-languadge LLM adapter design for epilepsy-care instruction, with support both Mandarin and English.
In this step, we handwrite some of the task in seed_task, then we ask the LLM to generate more seed_task record in both Mandarin and English. The seed_task is the instruction for epilepsy-care. After that, we check the generated seed_task and remove the bad generated record.
In this step, we ask the LLM to generate more synthetic data in both Mandarin and English. The man-write filter-rule is applied to filter the bad generated record. We also upload the Epilepsy_Synthetics dataset for research proposes only.
In this step, we finetune the LLM with the seed_task and synthetic data.