Contributing

Everyone is welcome to contribute, and we value contributions from the community. One of the best ways to contribute is by adding a data set to the evaluation benchmark!

How to “claim” a task

Find a task on the issues page. Self-assign or comment to indicate interest.
Coordinate when more than 1 contributor have indicated interests.
Open a new branch
Open a pull request "Add <task_name> dataset" when you are ready. Make sure to include
1. which model(s) the task was evaluated on
2. computation time benchmark on GPU (preferred) and/or CPU

How to add a task via the task template

New tasks will be placed under evaluation/tasks
Make a copy of the directory evaluation/tasks/template and rename the directory to match your task, i.e. in the root directory, run

cp -r evaluation/tasks/template evaluation/tasks/{{SOME_NEW_TASK}}

Your new task directory will include 4 files:
1. __init__.py
2. english.json: json file for task-specific configurations of english-only data (e.g. batch_size)
3. [For multilingual tasks only] multilingual.json: json file for task-specific configuration of multilingual data
4. task_name.py: the main module

What to implement in the task template

Wrap data as Pytorch Dataset/DataLoader
Rename TemplateTask (which inherits AutoTask) to match your task
Implement all abstract your task

References:

Template task
Fully implemented example for TydiQA Secondary

Other notes on development

Feel free to use Hugging Face's GPT2LMHead as the base model
Make modifications and commit any changes. It's best to make your commit messages informative to help your reviewer. Below is a few list of meta-labels to get you started.

#    feat     (new feature)
#    fix      (bug fix)
#    refactor (refactoring production code)
#    style    (formatting, missing semi colons, etc; no code change)
#    docs     (changes to documentation)
#    test     (adding or refactoring tests; no production code change)
#    chore    (updating grunt tasks etc; no production code change)
#    build    (changes that affect the build system or external dependencies)
#    ci       (changes to our CI configuration files and scripts)
#    version  (version bump/new release; no production code change)
#    debug    (Changes in debugging code/frameworks; no production code change)
#    license  (Edits regarding licensing; no production code change)
#    hack     (Temporary fix to make things move forward; please avoid)

For example, one possible commit message would be feat: implement lambada evaluation.

Write prompts to reformat the dataset to LM task if necessary (e.g. QA tasks)
1. Submit prompts to the promptsource repo
2. Prompts are in jinja2 format
3. Try to have at least 3 prompts
Run make quality at the roof of the repo to check for linting and code styling issues
Run make style at the root of the repo to auto-format the code

After contributing to the repo

Update the Overleaf Tech Report with information on the task you added
Add a new Github issue requesting your task be made multilingual
- Label the issue with “multilingual”
- Specify in the text of the issue which languages the task already supports
- The multilinguality group is working on recruiting speakers of all the training languages to adapt English prompts to other languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributing

How to “claim” a task

How to add a task via the task template

What to implement in the task template

Other notes on development

After contributing to the repo

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

How to “claim” a task

How to add a task via the task template

What to implement in the task template

Other notes on development

After contributing to the repo