This repository contains supplementary material from our paper of the same title. Since our paper can only capture the state of affairs at the time of publication, the idea here is to keep a more up-to-date version of the resources in the appendix here, and invite the community to collaborate in a transparent manner.
We maintain a version of Table 1 in the original paper, giving an overview over useful resources for different stages of the research process, namely Data, Codebase & Models, Experiments & Analysis, and Publication.
In CHECKLIST.md, we distil the actionable points at the end of the core paper sections into a reusable and modifiable checklist to ensure replicability.
In CHANGELOG.md, we transparently document changes to the repository and versioning. The current version is v0.1.
If you find the resources helpful or are using the checklist for one of your academic projects, please cite us in the following way:
@inproceedings{ulmer-etal-2022-experimental,
title = "Experimental Standards for Deep Learning in Natural Language Processing Research",
author = {Ulmer, Dennis and
Bassignana, Elisa and
M{\"u}ller-Eberstein, Max and
Varab, Daniel and
Zhang, Mike and
van der Goot, Rob and
Hardmeier, Christian and
Plank, Barbara},
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-emnlp.196",
pages = "2673--2692",
}
In your paper, you could cite our work for instance as follows:
For our experimental design, we follow many of the guidelines laid out by \citet{ulmer2022experimental}.
Contributing can come in two forms: Opening an issue to correct mistakes or improve the existing content, or adding new content by opening pull requests.
When opening an issue, please label the issue accordingly:
enhancement-resources
for issues improving or correcting entries in RESOURCES.md.enhancement-standards
for issues improving or correcting entries in CHECKLIST.md.duplicate
for indicating duplicate entries.general
for general questions / issues with the repository.
To contribute and add new content, please first check the CONTRIBUTING.md file and read the contributing guideline before opening a pull request. Use the label
enhancement-resources
for pull requests adding new resources andenhancement-standards
for pull requests adding new points to the checklist.
The pull request template can be checked under PULL_REQUEST_TEMPLATE.md.
We split up Table 1 from the paper into section specific resources below.
Name | Description | Link / Reference |
---|---|---|
Data Version Control (DVC) | Command line tool to version datasets and models | Link / Paper |
Hugging Face datasets | Hub to store and share (NLP) data set. | Link / Paper |
European Language Resources Association | Public institution for language and evaluation resources. | About / Link |
LINDAT/CLARIN | Open access to language resources and other data and services for the support of research in digital humanities and social sciences. | Link Paper |
Zenodo | General-purpose open-access repository for research papers, data sets, research software,reports, and any other research related digital artifacts. | Link |
Name | Description | Link / Reference |
---|---|---|
Anonymous Github | Website to double-anonymize a Github repository. | Link |
BitBucket | A website and cloud-based service that helps developers store and manage their code, as well as track and control changes to their code. | Link |
Conda | Open Source package management systemand environment management system. | Link |
codecarbon | Python package estimating and tracking carbon emission of various kind of computer programs. | Link |
ONNX | Open format built to represent Machine Learning models. | Link |
Pipenv | Virtual environment for managing Python packages. | Link |
Releasing Research Code | Github repository including many tips and templates for releasing research code | Link |
Virtualenv | Tool to create isolated Python environments. | Link |
Name | Description | Link / Reference |
---|---|---|
baycomp | Python implementation of Bayesian tests for the comparison of classifiers. | Link / Paper |
BayesianTestML | As baycomp, but also including Julia and R implementations. | Link / Paper |
confidenceinterval | Python package that computes confidence intervals for common evaluation metrics. | Link |
deep-significance | Python package implementing the ASO test by Dror et al. (2019) and other utilities. | Link |
HyBayes | Python package implementing a variety of frequentist and Bayesian significance tests. | Link |
Hugging Face evaluate | Library that implements standardized versions of evaluation metrics and significance tests | Link |
pingouin | Python package implementing various parametric and non-parametric statistical tests. | Link / Paper |
Protocol buffers | Data structure for model predictions | Link |
RankingNLPSystems | Python package to create a fair global ranking of models across multiple tasks (and eval metrics) | Link / Paper |
Name | Description | Link / Reference |
---|---|---|
dlpd | Computer science bibliography to find correct versions of papers. | Link |
impact | Online calculator of carbon emissions based on GPU type | Link / Paper |
Google scholar | Scientific publication search engine. | Link |
Semantic Scholar | Scientific publication search engine. | Link |
rebiber | Python tool to check and normalize the bib entries to the official published versions of the cited papers. | Link |