PromptSource implements 4 classes to store, manipulate and use prompts and their metadata: Template
, Metadata
, DatasetTemplates
and TemplateCollection
. All of them are implemented in templates.py
Template
is a class that wraps a prompt, its associated metadata, and implements the helper functions to use the prompt.
Instances of Template
have the following main methods that will come handy:
apply(example, truncate=True, highlight_variables=False)
: Create a prompted example by applying the template to the given exampleexample
(Dict): the dataset example to create a prompt fortruncate
(Bool, default toTrue
): if True, example fields will be truncated toTEXT_VAR_LENGTH
charshighlight_variables
(Bool, default toFalse
): highlight the added variables (internal use for the app rendering)
get_id()
: Get the uuid of the promptget_name()
: Get the name of the promptget_reference()
: Get any additional information about the prompt (such as bibliographic reference)get_answer_choices_list(example)
: If applicable, returns a list of answer choices for a given example.
Each Template
also has a metadata
attribute, an instance of the class Metadata
that encapsulates the following 3 attributes:
original_task
: If True, this prompt asks a model to perform the original task designed for this dataset.choices_in_prompt
: If True, the answer choices are included in the templates such that models see those choices in the input. Only applicable to classification tasks.metrics
: List of strings denoting metrics to use for evaluation
DatasetTemplates
is a class that wraps all the prompts (each of them are instances of Template
) for a specific dataset/subset and implements all the helper functions necessary to read/write to the YAML file in which the prompts are saved.
You will likely mainly be interested in getting the existing prompts and their names for a given dataset. You can do that with the following instantiation:
>>> template_key = f"{dataset_name}/{subset_name}" if subset_name is not None else dataset_name
>>> prompts = DatasetTemplates(template_key)
>>> len(prompts) # Returns the number of prompts for the given dataset
>>> prompts.all_template_names # Returns a sorted list of all templates names for this dataset
TemplateCollection
is a class that encapsulates all the prompts available under PromptSource by wrapping the DatasetTemplates
class. It initializes the DatasetTemplates
for all existing template folders, gives access to each DatasetTemplates
, and provides aggregated counts overall DatasetTemplates
.
The main methods are:
get_dataset(dataset_name, subset_name)
: Return the DatasetTemplates object corresponding to the dataset namedataset_name
(Str): name of the dataset to getsubset_name
(Str, default to None): name of the subset
get_templates_count()
: Return the overall number count over all datasets. NB: we don't breakdown datasets into subsets for the count, i.e subsets count are included into the dataset count