The code implementation to Epipaca

This is the cross-languadge LLM adapter design for epilepsy-care instruction, with support both Mandarin and English.

Epipaca Checkpoint
Synthetics Dataset

Training steps

Generate the seed_task(~200 Record)

In this step, we handwrite some of the task in seed_task, then we ask the LLM to generate more seed_task record in both Mandarin and English. The seed_task is the instruction for epilepsy-care. After that, we check the generated seed_task and remove the bad generated record.

Generate the synthetic data(2k Record)

In this step, we ask the LLM to generate more synthetic data in both Mandarin and English. The man-write filter-rule is applied to filter the bad generated record. We also upload the Epilepsy_Synthetics dataset for research proposes only.

Finetune the LLM

In this step, we finetune the LLM with the seed_task and synthetic data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The code implementation to Epipaca

Training steps

Generate the seed_task(~200 Record)

Generate the synthetic data(2k Record)

Finetune the LLM

Files

README.md

Latest commit

History

README.md

File metadata and controls

The code implementation to Epipaca

Training steps

Generate the seed_task(~200 Record)

Generate the synthetic data(2k Record)

Finetune the LLM