Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEM/ToTTo #28

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

manandey
Copy link
Member

No description provided.

@manandey manandey requested review from cjlovering and jon-tow April 29, 2022 05:52
@manandey manandey requested a review from StellaAthena as a code owner April 30, 2022 16:41
@cjlovering
Copy link
Collaborator

Can you note what models you were able to run this with?

I tried to run this off your fork, but I wasn't able to load any prompts? Are they merged into the eval_hackathon branch on PS?

@manandey
Copy link
Member Author

manandey commented May 2, 2022

Can you note what models you were able to run this with?

I tried to run this off your fork, but I wasn't able to load any prompts? Are they merged into the eval_hackathon branch on PS?

Hi @cjlovering, I tried running it on GPT 2. No, the prompts are not yet merged into the eval_hackathon branch on PS.

For python -m main --device cpu --tasks gem_totto --num_fewshot 0 --limit 5 --model gpt2, the results were:

gpt2 (), limit: 5, provide_description: False, num_fewshot: 0, batch_size: None
|  Task   |          Prompt           |Version|      Metric       | Value |   |Stderr|
|---------|---------------------------|------:|-------------------|------:|---|-----:|
|gem_totto|final_text_describing_table|      0|bleu               |13.7120|±  |7.6106|
|gem_totto|final_text_describing_table|       |rouge1_precision   | 0.1914|±  |0.0706|
|gem_totto|final_text_describing_table|       |rouge1_recall      | 0.5547|±  |0.1688|
|gem_totto|final_text_describing_table|       |rouge1_fmeasure    | 0.2827|±  |0.0991|
|gem_totto|final_text_describing_table|       |rouge2_precision   | 0.1455|±  |0.0721|
|gem_totto|final_text_describing_table|       |rouge2_recall      | 0.4259|±  |0.1743|
|gem_totto|final_text_describing_table|       |rouge2_fmeasure    | 0.2151|±  |0.1019|
|gem_totto|final_text_describing_table|       |rougeL_precision   | 0.1825|±  |0.0695|
|gem_totto|final_text_describing_table|       |rougeL_recall      | 0.5261|±  |0.1613|
|gem_totto|final_text_describing_table|       |rougeL_fmeasure    | 0.2691|±  |0.0969|
|gem_totto|final_text_describing_table|       |rougeLsum_precision| 0.1914|±  |0.0706|
|gem_totto|final_text_describing_table|       |rougeLsum_recall   | 0.5547|±  |0.1688|
|gem_totto|final_text_describing_table|       |rougeLsum_fmeasure | 0.2827|±  |0.0991|
|gem_totto|guess the table page title |      0|bleu               | 0.2054|±  |0.0348|
|gem_totto|guess the table page title |       |rouge1_precision   | 0.0097|±  |0.0059|
|gem_totto|guess the table page title |       |rouge1_recall      | 0.1667|±  |0.1054|
|gem_totto|guess the table page title |       |rouge1_fmeasure    | 0.0182|±  |0.0112|
|gem_totto|guess the table page title |       |rouge2_precision   | 0.0000|±  |0.0000|
|gem_totto|guess the table page title |       |rouge2_recall      | 0.0000|±  |0.0000|
|gem_totto|guess the table page title |       |rouge2_fmeasure    | 0.0000|±  |0.0000|
|gem_totto|guess the table page title |       |rougeL_precision   | 0.0097|±  |0.0059|
|gem_totto|guess the table page title |       |rougeL_recall      | 0.1667|±  |0.1054|
|gem_totto|guess the table page title |       |rougeL_fmeasure    | 0.0182|±  |0.0112|
|gem_totto|guess the table page title |       |rougeLsum_precision| 0.0097|±  |0.0059|
|gem_totto|guess the table page title |       |rougeLsum_recall   | 0.1667|±  |0.1054|
|gem_totto|guess the table page title |       |rougeLsum_fmeasure | 0.0182|±  |0.0112|
|gem_totto|guess the table webpage url|      0|bleu               | 1.8425|±  |0.7780|
|gem_totto|guess the table webpage url|       |rouge1_precision   | 0.0341|±  |0.0122|
|gem_totto|guess the table webpage url|       |rouge1_recall      | 0.1893|±  |0.0648|
|gem_totto|guess the table webpage url|       |rouge1_fmeasure    | 0.0577|±  |0.0206|
|gem_totto|guess the table webpage url|       |rouge2_precision   | 0.0148|±  |0.0099|
|gem_totto|guess the table webpage url|       |rouge2_recall      | 0.0905|±  |0.0585|
|gem_totto|guess the table webpage url|       |rouge2_fmeasure    | 0.0254|±  |0.0170|
|gem_totto|guess the table webpage url|       |rougeL_precision   | 0.0341|±  |0.0122|
|gem_totto|guess the table webpage url|       |rougeL_recall      | 0.1893|±  |0.0648|
|gem_totto|guess the table webpage url|       |rougeL_fmeasure    | 0.0577|±  |0.0206|
|gem_totto|guess the table webpage url|       |rougeLsum_precision| 0.0341|±  |0.0122|
|gem_totto|guess the table webpage url|       |rougeLsum_recall   | 0.1893|±  |0.0648|
|gem_totto|guess the table webpage url|       |rougeLsum_fmeasure | 0.0577|±  |0.0206|

Thanks to @jon-tow for helping me fix the issues I was facing.

@cjlovering
Copy link
Collaborator

Great -- its looking good! Let's wait on merging til the prompts get pulled in on the PS side.

@cjlovering
Copy link
Collaborator

@manandey How is this going? Is GEM/ToTTO merged into promptsource?

@manandey
Copy link
Member Author

Hi @cjlovering, the PR was raised a long time back for this in promptsource, but the review process was a bit slow. Now, again changes have been suggested for the templates created. Hope the PR gets merged soon. Will keep you posted.

@StellaAthena StellaAthena added the awaiting promptsource PRs that need the corresponding PR merged into PromptSource label May 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting promptsource PRs that need the corresponding PR merged into PromptSource
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants