Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use promptsource templates #62

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tianjianjiang
Copy link
Contributor

A simple proposal of using promptsource directly such that we don't have to implement it from scratch.

@tianjianjiang tianjianjiang force-pushed the feat-use_promptsource_templates branch from 4207819 to 5cd09aa Compare August 30, 2021 15:34
@@ -4,3 +4,4 @@ tensorflow==2.5.0
torch==1.9.0
tqdm==4.62.0
transformers==4.9.1
promptsource @ git+https://[email protected]/bigscience-workshop/promptsource.git@main
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side note: ssh will fail.


def test_promptsource_template():
ds_key, sub_key = "tydiqa", "secondary_task"
tydiqa_sec_vld_ds = load_dataset(ds_key, sub_key, split="validation", streaming=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promptsource also has a helper of dataset loading but I really want to use streaming=True if at all possible (depending on each dataset's compression format).

tydiqa_sec_vld_ds_en = filter(lambda x: x["id"].split("-")[0] == "english", tydiqa_sec_vld_ds)
template_collection = TemplateCollection()
tydiqa_sec_tmpls = template_collection.get_dataset(ds_key, sub_key)
tmpl = tydiqa_sec_tmpls["simple_question_reading_comp_2"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same prompt template of evaluation.tasks.tydiqa_secondary.TyDiQADataset.

template_collection = TemplateCollection()
tydiqa_sec_tmpls = template_collection.get_dataset(ds_key, sub_key)
tmpl = tydiqa_sec_tmpls["simple_question_reading_comp_2"]
prompt, _ = tmpl.apply(removeHyphen(next(tydiqa_sec_vld_ds_en)))
Copy link
Contributor Author

@tianjianjiang tianjianjiang Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is actually a list, but if the template didn't apply, then there will be no second element (the expected answer/target).

Although only doing removeHyphen() here, promptsource has some more preprocessing for classification, see https://github.com/bigscience-workshop/promptsource/blob/main/promptsource/seqio_tasks/tasks.py

@tianjianjiang tianjianjiang force-pushed the feat-use_promptsource_templates branch from 5cd09aa to b9ae559 Compare August 30, 2021 15:48
@@ -19,6 +19,8 @@ tensorflow = "2.5.0"
torch = "1.9.0"
tqdm = "4.62.0"
transformers = "4.9.1"
promptsource = {git = "https://[email protected]/bigscience-workshop/promptsource.git", rev = "main"}
aiohttp = "^3.7.4"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as dataset[streaming] but we may want to control the version of aiohttp separately just in case.

@@ -4,3 +4,5 @@ tensorflow==2.5.0
torch==1.9.0
tqdm==4.62.0
transformers==4.9.1
promptsource @ git+https://[email protected]/bigscience-workshop/promptsource.git@main
aiohttp==3.7.4
Copy link
Contributor Author

@tianjianjiang tianjianjiang Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -19,6 +19,8 @@ tensorflow = "2.5.0"
torch = "1.9.0"
tqdm = "4.62.0"
transformers = "4.9.1"
promptsource = {git = "https://[email protected]/bigscience-workshop/promptsource.git", rev = "main"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant