Skip to content

Commit

Permalink
Set metadata only once in recipe (#1437)
Browse files Browse the repository at this point in the history
* update metadata just once in a recipe

Signed-off-by: dafnapension <[email protected]>

* pass recipe_metadata through task

Signed-off-by: dafnapension <[email protected]>

---------

Signed-off-by: dafnapension <[email protected]>
Co-authored-by: Elron Bandel <[email protected]>
  • Loading branch information
dafnapension and elronbandel authored Dec 18, 2024
1 parent f80080b commit c46b7fe
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 32 deletions.
88 changes: 56 additions & 32 deletions src/unitxt/standard.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,6 @@ def set_pipelines(self):
self.metadata,
self.standardization,
self.processing,
self.metadata,
self.verbalization,
self.finalize,
]
Expand All @@ -213,7 +212,6 @@ def set_pipelines(self):
self.inference_instance.steps = [
self.metadata,
self.processing,
self.metadata,
]

self.inference_demos = SourceSequentialOperator()
Expand All @@ -223,7 +221,6 @@ def set_pipelines(self):
self.metadata,
self.standardization,
self.processing,
self.metadata,
]

self.inference = SequentialOperator()
Expand Down Expand Up @@ -478,39 +475,66 @@ class StandardRecipe(StandardRecipeWithIndexes):
with all necessary steps, refiners and renderers included. It allows to set various
parameters and steps in a sequential manner for preparing the recipe.
Attributes:
card (TaskCard): TaskCard object associated with the recipe.
template (Template, optional): Template object to be used for the recipe.
system_prompt (SystemPrompt, optional): SystemPrompt object to be used for the recipe.
loader_limit (int, optional): Specifies the maximum number of instances per stream to be returned from the loader (used to reduce loading time in large datasets)
format (SystemFormat, optional): SystemFormat object to be used for the recipe.
metrics (List[str]): list of catalog metrics to use with this recipe.
postprocessors (List[str]): list of catalog processors to apply at post processing. (Not recommended to use from here)
group_by (List[Union[str, List[str]]]): list of task_data or metadata keys to group global scores by.
train_refiner (StreamRefiner, optional): Train refiner to be used in the recipe.
max_train_instances (int, optional): Maximum training instances for the refiner.
validation_refiner (StreamRefiner, optional): Validation refiner to be used in the recipe.
max_validation_instances (int, optional): Maximum validation instances for the refiner.
test_refiner (StreamRefiner, optional): Test refiner to be used in the recipe.
max_test_instances (int, optional): Maximum test instances for the refiner.
demos_pool_size (int, optional): Size of the demos pool.
num_demos (int, optional): Number of demos to be used.
demos_pool_name (str, optional): Name of the demos pool. Default is "demos_pool".
demos_taken_from (str, optional): Specifies from where the demos are taken. Default is "train".
demos_field (str, optional): Field name for demos. Default is "demos".
demos_removed_from_data (bool, optional): whether to remove the demos from the source data, Default is True
sampler (Sampler, optional): The Sampler used to select the demonstrations when num_demos > 0.
steps (List[StreamingOperator], optional): List of StreamingOperator objects to be used in the recipe.
augmentor (Augmentor) : Augmentor to be used to pseudo randomly augment the source text
instruction_card_index (int, optional): Index of instruction card to be used for preparing the recipe.
template_card_index (int, optional): Index of template card to be used for preparing the recipe.
Args:
card (TaskCard):
TaskCard object associated with the recipe.
template (Template, optional):
Template object to be used for the recipe.
system_prompt (SystemPrompt, optional):
SystemPrompt object to be used for the recipe.
loader_limit (int, optional):
Specifies the maximum number of instances per stream to be returned from the loader (used to reduce loading time in large datasets)
format (SystemFormat, optional):
SystemFormat object to be used for the recipe.
metrics (List[str]):
list of catalog metrics to use with this recipe.
postprocessors (List[str]):
list of catalog processors to apply at post processing. (Not recommended to use from here)
group_by (List[Union[str, List[str]]]):
list of task_data or metadata keys to group global scores by.
train_refiner (StreamRefiner, optional):
Train refiner to be used in the recipe.
max_train_instances (int, optional):
Maximum training instances for the refiner.
validation_refiner (StreamRefiner, optional):
Validation refiner to be used in the recipe.
max_validation_instances (int, optional):
Maximum validation instances for the refiner.
test_refiner (StreamRefiner, optional):
Test refiner to be used in the recipe.
max_test_instances (int, optional):
Maximum test instances for the refiner.
demos_pool_size (int, optional):
Size of the demos pool.
num_demos (int, optional):
Number of demos to be used.
demos_pool_name (str, optional):
Name of the demos pool. Default is "demos_pool".
demos_taken_from (str, optional):
Specifies from where the demos are taken. Default is "train".
demos_field (str, optional):
Field name for demos. Default is "demos".
demos_removed_from_data (bool, optional):
whether to remove the demos from the source data, Default is True
sampler (Sampler, optional):
The Sampler used to select the demonstrations when num_demos > 0.
steps (List[StreamingOperator], optional):
List of StreamingOperator objects to be used in the recipe.
augmentor (Augmentor) :
Augmentor to be used to pseudo randomly augment the source text
instruction_card_index (int, optional):
Index of instruction card to be used for preparing the recipe.
template_card_index (int, optional):
Index of template card to be used for preparing the recipe.
Methods:
prepare(): This overridden method is used for preparing the recipe
by arranging all the steps, refiners, and renderers in a sequential manner.
prepare():
This overridden method is used for preparing the recipe
by arranging all the steps, refiners, and renderers in a sequential manner.
Raises:
AssertionError: If both template and template_card_index are specified at the same time.
AssertionError:
If both template and template_card_index are specified at the same time.
"""

pass
1 change: 1 addition & 0 deletions src/unitxt/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ def process(
"metrics": self.metrics,
"data_classification_policy": data_classification_policy,
"media": instance.get("media", {}),
"recipe_metadata": instance.get("recipe_metadata", {}),
}

if stream_name == constants.inference_stream:
Expand Down

0 comments on commit c46b7fe

Please sign in to comment.