diff --git a/docs/docs/examples.rst b/docs/docs/examples.rst index 897054568a..26aec01f55 100644 --- a/docs/docs/examples.rst +++ b/docs/docs/examples.rst @@ -47,10 +47,23 @@ Evaluate a custom dataset - with existing predictions These examples demonstrate how to evaluate a datasets of different tasks when predictions are already available and no inference is required. `Example code for QA task `__ + `Example code for classification task `__ Related documentation: :ref:`Evaluating datasets ` +Evaluate a Named Entity Recognition (NER) dataset +=================================================== + +This example demonstrates how to evaluate a named entity recognition task. +The ground truth entities are provided as spans within the provided texts, +and the model is prompted to identify these entities. +Classifical f1_micro, f1_macro, and per-entity-type f1 metrics are reported. + +Example code `__ + +Related documentation: :ref:`Add new dataset tutorial `, :ref:`Open NER task in catalog `, :ref:`Inference Engines `. + Evaluation usecases ----------------------- diff --git a/docs/docs/saving_and_loading_from_catalog.rst b/docs/docs/saving_and_loading_from_catalog.rst index 41e912f469..a383753122 100644 --- a/docs/docs/saving_and_loading_from_catalog.rst +++ b/docs/docs/saving_and_loading_from_catalog.rst @@ -42,7 +42,7 @@ It's also possible to add artifacts to the library's default catalog: Using Catalog Assets -------------------- -To use catalog objects, simply specify their name in the Unitxt object that will use them. +To use catalog objects, simply specify their name in the Unitxt object that will use them. .. code-block:: python @@ -56,8 +56,8 @@ To use catalog objects, simply specify their name in the Unitxt object that will Modifying Catalog Assets on the Fly ----------------------------------- -To modify a catalog asset's fields dynamically, upon fetching the asset from the catalog, use the syntax: ``artifact_name[key_to_modify=new_value]``. -To assign lists, use: ``artifact_name[key_to_modify=[new_value_0, new_value_1]]``. +To modify a catalog asset's fields dynamically, upon fetching the asset from the catalog, use the syntax: ``artifact_name[key_to_modify=new_value]``. +To assign lists, use: ``artifact_name[key_to_modify=[new_value_0, new_value_1]]``. To assign dictionaries, use: ``artifact_name[key_to_modify={new_key_0=new_value_0,new_key_1=new_value_1}]``. Note that the whole new value of the field has to be specified; not just one item of a list, or one key of the dictionary. For instance, to change the metric specification of a task: @@ -85,20 +85,20 @@ Use ``get_from_catalog`` to directly access catalog assets, and obtain an asset A Catalog Asset Linking to Another Catalog Asset ------------------------------------------------ -A catalog asset can be just a link to another asset. -This feature comes handy when for some reason, we want to change the catalog name -of an existing asset (e.g. ``asset1`` to ``asset2``), while there is already code +A catalog asset can be just a link to another asset. +This feature comes handy when for some reason, we want to change the catalog name +of an existing asset (e.g. ``asset1`` to ``asset2``), while there is already code that uses the old name of the asset and we want to avoid non-backward compatible changes. -In such a case, we can save the asset as ``asset2``, create an asset of type +In such a case, we can save the asset as ``asset2``, create an asset of type :class:`ArtifactLink ` that links to ``asset2``, and save that one as ``asset1``. -When ``asset1`` is accessed from an existing code, Unixt Catalog realizes that the asset fetched from position ``asset1`` -is an ``ArtifactLink``, so it continues and fetches ``asset2`` -- the Artifact linked to by ``asset1``. +When ``asset1`` is accessed from an existing code, Unixt Catalog realizes that the asset fetched from position ``asset1`` +is an ``ArtifactLink``, so it continues and fetches ``asset2`` -- the Artifact linked to by ``asset1``. .. code-block:: python - link_to_asset2 = ArtifactLink(artifact_linked_to="asset2") + link_to_asset2 = ArtifactLink(to="asset2") add_to_catalog( link_to_asset2, "asset1", @@ -109,8 +109,8 @@ Deprecated Asset ---------------- Every asset has a special field named ``__deprecated_msg__`` of type ``str``, whose default value is None. -When None, the asset is cocnsidered non-deprecated. When not None, the asset is considered deprecated, and -its ``__deprecated_msg__`` is logged at level WARN upon its instantiation. (Other than this logging, +When None, the asset is cocnsidered non-deprecated. When not None, the asset is considered deprecated, and +its ``__deprecated_msg__`` is logged at level WARN upon its instantiation. (Other than this logging, the artifact is instantiated normally.) Example of a deprecated catalog asset: @@ -123,12 +123,12 @@ Example of a deprecated catalog asset: "text": "You are an agent in charge of answering a boolean (yes/no) question. The system presents you with a passage and a question. Read the passage carefully, and then answer yes or no. Think about your answer, and make sure it makes sense. Do not explain the answer. Only say yes or no." } -Combining this feature with ``ArtifactLink`` in the above example, we can also log a warning to the accessing code that -the name ``asset1`` is to be replaced by ``asset2``. +Combining this feature with ``ArtifactLink`` in the above example, we can also log a warning to the accessing code that +the name ``asset1`` is to be replaced by ``asset2``. .. code-block:: python - link_to_asset2 = ArtifactLink(artifact_linked_to="asset2", + link_to_asset2 = ArtifactLink(to="asset2", __deprecated_msg__="'asset1' is going to be deprecated. In future uses, please access 'asset2' instead.") add_to_catalog( link_to_asset2, diff --git a/examples/ner_evaluation.py b/examples/ner_evaluation.py new file mode 100644 index 0000000000..62fe3b5ed1 --- /dev/null +++ b/examples/ner_evaluation.py @@ -0,0 +1,70 @@ +import json + +from unitxt import get_logger +from unitxt.api import create_dataset, evaluate +from unitxt.inference import ( + CrossProviderInferenceEngine, +) + +logger = get_logger() +entity_types = ["Person", "Location", "Organization"] + + +test_set = [ + { + "text": "John lives in Texas.", + "entity_types": entity_types, + "spans_starts": [0, 14], + "spans_ends": [5, 19], + "labels": ["Person", "Location"], + }, + { + "text": "Phil works at Apple and eats an apple.", + "entity_types": entity_types, + "spans_starts": [0, 14], + "spans_ends": [5, 19], + "labels": ["Person", "Organization"], + }, +] + + +dataset = create_dataset( + task="tasks.ner.all_entity_types", + test_set=test_set, + split="test", + format="formats.chat_api", +) + +# Infer using Llama-3.2-1B base using HF API +# model = HFPipelineBasedInferenceEngine( +# model_name="Qwen/Qwen1.5-0.5B-Chat", max_new_tokens=32 +# ) +# Change to this to infer with external APIs: + +model = CrossProviderInferenceEngine(model="llama-3-8b-instruct", provider="watsonx") +# The provider can be one of: ["watsonx", "together-ai", "open-ai", "aws", "ollama", "bam"] + + +predictions = model(dataset) +results = evaluate(predictions=predictions, data=dataset) + +print("Global Results:") +print(results.global_scores.summary) + +print("Example prompt:") + +print(json.dumps(results.instance_scores[0]["source"], indent=4)) + +print("Instance Results:") +print( + results.instance_scores.to_df( + columns=[ + "text", + "prediction", + "processed_prediction", + "processed_references", + "score", + "score_name", + ] + ).to_markdown() +) diff --git a/prepare/cards/atis.py b/prepare/cards/atis.py index 834c47afa7..a186c3fda9 100644 --- a/prepare/cards/atis.py +++ b/prepare/cards/atis.py @@ -8,7 +8,7 @@ from unitxt.span_lableing_operators import IobExtractor from unitxt.test_utils.card import test_card -classes = [ +entity_types = [ "aircraft_code", "airline_code", "airline_name", @@ -103,9 +103,9 @@ }, ), IobExtractor( - labels=classes, - begin_labels=["B-" + c for c in classes], - inside_labels=["I-" + c for c in classes], + labels=entity_types, + begin_labels=["B-" + c for c in entity_types], + inside_labels=["I-" + c for c in entity_types], outside_label="O", ), Copy( @@ -117,7 +117,7 @@ get_default=[], not_exist_ok=True, ), - Set(fields={"classes": classes}), + Set(fields={"entity_types": entity_types}), ], task="tasks.span_labeling.extraction", templates="templates.span_labeling.extraction.all", diff --git a/prepare/cards/universal_ner.py b/prepare/cards/universal_ner.py index 4e89d4b57a..bf0b1a092f 100644 --- a/prepare/cards/universal_ner.py +++ b/prepare/cards/universal_ner.py @@ -76,7 +76,7 @@ ), Set( fields={ - "classes": ["Person", "Organization", "Location"], + "entity_types": ["Person", "Organization", "Location"], } ), ], diff --git a/prepare/tasks/ner.py b/prepare/tasks/ner.py index 57cdf8cf5f..fa6f2012e7 100644 --- a/prepare/tasks/ner.py +++ b/prepare/tasks/ner.py @@ -1,7 +1,7 @@ from typing import List, Tuple from unitxt.blocks import Task -from unitxt.catalog import add_to_catalog +from unitxt.catalog import add_link_to_catalog, add_to_catalog add_to_catalog( Task( @@ -20,19 +20,9 @@ overwrite=True, ) -add_to_catalog( - Task( - input_fields={"text": str, "entity_types": List[str]}, - reference_fields={ - "spans_starts": List[int], - "spans_ends": List[int], - "text": str, - "labels": List[str], - }, - prediction_type=List[Tuple[str, str]], - metrics=["metrics.ner"], - augmentable_inputs=["text"], - ), - "tasks.ner.all_entity_types", +add_link_to_catalog( + artifact_linked_to="tasks.span_labeling.extraction", + name="tasks.ner.all_entity_types", + deprecate=False, overwrite=True, ) diff --git a/prepare/tasks/span_labeling.py b/prepare/tasks/span_labeling.py index 28d152b123..a591455ae0 100644 --- a/prepare/tasks/span_labeling.py +++ b/prepare/tasks/span_labeling.py @@ -5,11 +5,17 @@ add_to_catalog( Task( + __description__="""This is Entity Extraction task where multiple entity types are to be extracted. +The input is the 'text' and 'entity_types' to extract (e.g. ["Organization", "Location", "Person"]) + +By default, classical f1 metric is used, which expects a list of pairs. +Multiple f1 score are reported, including f1_micro and f1_macro and f1 per per entity_type.". +The template's post processors must convert the model textual predictions into the expected list format. +""", input_fields={ "text": str, "text_type": str, - "class_type": str, - "classes": List[str], + "entity_types": List[str], }, reference_fields={ "text": str, @@ -22,7 +28,8 @@ "metrics.ner", ], augmentable_inputs=["text"], - defaults={"text_type": "text", "class_type": "entity type"}, + defaults={"text_type": "text"}, + default_template="templates.span_labeling.extraction.detailed", ), "tasks.span_labeling.extraction", overwrite=True, diff --git a/prepare/templates/span_labeling/templates.py b/prepare/templates/span_labeling/templates.py index 709e78a8a0..0edd18754c 100644 --- a/prepare/templates/span_labeling/templates.py +++ b/prepare/templates/span_labeling/templates.py @@ -7,7 +7,7 @@ add_to_catalog( SpanLabelingTemplate( input_format="{text_type}: {text}", - instruction="From the following {text_type}, extract the objects for which the {class_type} expressed is one of {classes}.", + instruction="From the following {text_type}, extract the objects for which the entity type expressed is one of {entity_types}.", postprocessors=["processors.to_span_label_pairs"], ), "templates.span_labeling.extraction.extract", @@ -17,7 +17,7 @@ add_to_catalog( SpanLabelingTemplate( input_format="{text_type}: {text}", - instruction="From the following {text_type}, extract spans having a {class_type}: {classes}.", + instruction="From the following {text_type}, extract spans having a entity type: {entity_types}.", postprocessors=["processors.to_span_label_pairs"], ), "templates.span_labeling.extraction.having", @@ -26,7 +26,7 @@ add_to_catalog( SpanLabelingTemplate( - input_format="{text_type}: {text}\nFrom this {text_type}, extract entities that carry one of the following types: {classes}.", + input_format="{text_type}: {text}\nFrom this {text_type}, extract entities that carry one of the following types: {entity_types}.", postprocessors=["processors.to_span_label_pairs"], ), "templates.span_labeling.extraction.carry", @@ -36,7 +36,7 @@ add_to_catalog( SpanLabelingTemplate( input_format="{text_type}: {text}", - instruction="From the following {text_type}, identify spans with {class_type}:{classes}.", + instruction="From the following {text_type}, identify spans with entity type:{entity_types}.", postprocessors=["processors.to_span_label_pairs"], ), "templates.span_labeling.extraction.identify", @@ -55,19 +55,34 @@ add_to_catalog( SpanLabelingTemplate( input_format="{text_type}:\n{text}", - instruction="From the following {text_type}, extract the objects for which the {class_type} expressed is one of {classes}.", - target_prefix="{class_type}:\n", + instruction="From the following {text_type}, extract the objects for which the entity type expressed is one of {entity_types}.", + target_prefix="entity type:\n", postprocessors=["processors.to_span_label_pairs"], - title_fields=["text_type", "class_type"], + title_fields=["text_type"], ), "templates.span_labeling.extraction.title", overwrite=True, ) +add_to_catalog( + SpanLabelingTemplate( + instruction="""From the given {text_type}, extract all the entities of the following entity types: {entity_types}. +Return the output in this exact format: +The output should be a comma separated list of pairs of entity and corresponding entity_type. +Use a colon to separate between the entity and entity_type. """, + input_format="{text_type}:\n{text}", + postprocessors=["processors.to_span_label_pairs"], + ), + "templates.span_labeling.extraction.detailed", + overwrite=True, +) + + add_to_catalog( TemplatesList( items=[ + "templates.span_labeling.extraction.detailed", "templates.span_labeling.extraction.extract", "templates.span_labeling.extraction.having", "templates.span_labeling.extraction.carry", diff --git a/src/unitxt/artifact.py b/src/unitxt/artifact.py index ed9036225c..df4dbdf405 100644 --- a/src/unitxt/artifact.py +++ b/src/unitxt/artifact.py @@ -282,10 +282,12 @@ def from_dict(cls, d, overwrite_args=None): @classmethod def load(cls, path, artifact_identifier=None, overwrite_args=None): d = artifacts_json_cache(path) - if "artifact_linked_to" in d and d["artifact_linked_to"] is not None: - # d stands for an ArtifactLink - artifact_link = ArtifactLink.from_dict(d) - return artifact_link.load(overwrite_args) + if "__type__" in d and d["__type__"] == "artifact_link": + cls.from_dict(d) # for verifications and warnings + catalog, artifact_rep, _ = get_catalog_name_and_args(name=d["to"]) + return catalog.get_with_overwrite( + artifact_rep, overwrite_args=overwrite_args + ) new_artifact = cls.from_dict(d, overwrite_args=overwrite_args) new_artifact.__id__ = artifact_identifier @@ -466,54 +468,11 @@ def __repr__(self): class ArtifactLink(Artifact): - # the artifact linked to, expressed by its catalog id - artifact_linked_to: str = Field(default=None, required=True) + to: Artifact - @classmethod - def from_dict(cls, d: dict): - assert isinstance(d, dict), f"argument must be a dictionary, got: d = {d}." - assert ( - "artifact_linked_to" in d and d["artifact_linked_to"] is not None - ), f"A non-none field named 'artifact_linked_to' is expected in input argument d, but got: {d}." - artifact_linked_to = d["artifact_linked_to"] - # artifact_linked_to is a name of catalog entry - assert isinstance( - artifact_linked_to, str - ), f"'artifact_linked_to' should be a string expressing a name of a catalog entry. Got{artifact_linked_to}." - msg = d["__deprecated_msg__"] if "__deprecated_msg__" in d else None - return ArtifactLink( - artifact_linked_to=artifact_linked_to, __deprecated_msg__=msg - ) - - def load(self, overwrite_args: dict) -> Artifact: - # identify the catalog for the artifact_linked_to - assert ( - self.artifact_linked_to is not None - ), "'artifact_linked_to' must be non-None in order to load it from the catalog. Currently, it is None." - assert isinstance( - self.artifact_linked_to, str - ), f"'artifact_linked_to' should be a string (expressing a name of a catalog entry). Currently, its type is: {type(self.artifact_linked_to)}." - needed_catalog = None - catalogs = list(Catalogs()) - for catalog in catalogs: - if self.artifact_linked_to in catalog: - needed_catalog = catalog - - if needed_catalog is None: - raise UnitxtArtifactNotFoundError(self.artifact_linked_to, catalogs) - - path = needed_catalog.path(self.artifact_linked_to) - d = artifacts_json_cache(path) - # if needed, follow, in a recursive manner, over multiple links, - # passing through instantiating of the ArtifactLink-s on the way, triggering - # deprecatioin warning as needed. - if "artifact_linked_to" in d and d["artifact_linked_to"] is not None: - # d stands for an ArtifactLink - artifact_link = ArtifactLink.from_dict(d) - return artifact_link.load(overwrite_args) - new_artifact = Artifact.from_dict(d, overwrite_args=overwrite_args) - new_artifact.__id__ = self.artifact_linked_to - return new_artifact + def verify(self): + if self.to.__id__ is None: + raise UnitxtError("ArtifactLink must link to existing catalog entry.") def get_raw(obj): @@ -577,14 +536,12 @@ def fetch_artifact(artifact_rep) -> Tuple[Artifact, Union[AbstractCatalog, None] """ if isinstance(artifact_rep, Artifact): if isinstance(artifact_rep, ArtifactLink): - return fetch_artifact(artifact_rep.artifact_linked_to) + return fetch_artifact(artifact_rep.to) return artifact_rep, None # If local file if isinstance(artifact_rep, str) and Artifact.is_artifact_file(artifact_rep): artifact_to_return = Artifact.load(artifact_rep) - if isinstance(artifact_rep, ArtifactLink): - artifact_to_return = fetch_artifact(artifact_to_return.artifact_linked_to) return artifact_to_return, None diff --git a/src/unitxt/catalog.py b/src/unitxt/catalog.py index ee4347cfc8..3221c3ee0d 100644 --- a/src/unitxt/catalog.py +++ b/src/unitxt/catalog.py @@ -153,7 +153,7 @@ def add_link_to_catalog( deprecated_msg = None artifact_link = ArtifactLink( - artifact_linked_to=artifact_linked_to, __deprecated_msg__=deprecated_msg + to=artifact_linked_to, __deprecated_msg__=deprecated_msg ) add_to_catalog( diff --git a/src/unitxt/catalog/augmentors/augment_whitespace_prefix_and_suffix_task_input.json b/src/unitxt/catalog/augmentors/augment_whitespace_prefix_and_suffix_task_input.json index 5d97983afb..1857b4d2ee 100644 --- a/src/unitxt/catalog/augmentors/augment_whitespace_prefix_and_suffix_task_input.json +++ b/src/unitxt/catalog/augmentors/augment_whitespace_prefix_and_suffix_task_input.json @@ -1,5 +1,5 @@ { "__type__": "artifact_link", - "artifact_linked_to": "augmentors.text.whitespace_prefix_suffix", + "to": "augmentors.text.whitespace_prefix_suffix", "__deprecated_msg__": "Artifact 'augmentors.augment_whitespace_prefix_and_suffix_task_input' is deprecated. Artifact 'augmentors.text.whitespace_prefix_suffix' will be instantiated instead. In future uses, please reference artifact 'augmentors.text.whitespace_prefix_suffix' directly." } diff --git a/src/unitxt/catalog/augmentors/augment_whitespace_task_input.json b/src/unitxt/catalog/augmentors/augment_whitespace_task_input.json index 117304ea7c..c802e52026 100644 --- a/src/unitxt/catalog/augmentors/augment_whitespace_task_input.json +++ b/src/unitxt/catalog/augmentors/augment_whitespace_task_input.json @@ -1,5 +1,5 @@ { "__type__": "artifact_link", - "artifact_linked_to": "augmentors.text.whitespace_prefix_suffix", + "to": "augmentors.text.whitespace_prefix_suffix", "__deprecated_msg__": "Artifact 'augmentors.augment_whitespace_task_input' is deprecated. Artifact 'augmentors.text.whitespace_prefix_suffix' will be instantiated instead. In future uses, please reference artifact 'augmentors.text.whitespace_prefix_suffix' directly." } diff --git a/src/unitxt/catalog/cards/atis.json b/src/unitxt/catalog/cards/atis.json index ea9d6e1fe5..6a273c868c 100644 --- a/src/unitxt/catalog/cards/atis.json +++ b/src/unitxt/catalog/cards/atis.json @@ -273,7 +273,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "aircraft_code", "airline_code", "airline_name", diff --git a/src/unitxt/catalog/cards/universal_ner/ceb/gja.json b/src/unitxt/catalog/cards/universal_ner/ceb/gja.json index af72b19e8f..78cd123a1d 100644 --- a/src/unitxt/catalog/cards/universal_ner/ceb/gja.json +++ b/src/unitxt/catalog/cards/universal_ner/ceb/gja.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/da/ddt.json b/src/unitxt/catalog/cards/universal_ner/da/ddt.json index 1e6e226da9..b3401be13f 100644 --- a/src/unitxt/catalog/cards/universal_ner/da/ddt.json +++ b/src/unitxt/catalog/cards/universal_ner/da/ddt.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/de/pud.json b/src/unitxt/catalog/cards/universal_ner/de/pud.json index 54a70fb281..e39b3a06a6 100644 --- a/src/unitxt/catalog/cards/universal_ner/de/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/de/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/en/ewt.json b/src/unitxt/catalog/cards/universal_ner/en/ewt.json index d35cc3a91a..0a4aab8bfa 100644 --- a/src/unitxt/catalog/cards/universal_ner/en/ewt.json +++ b/src/unitxt/catalog/cards/universal_ner/en/ewt.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/en/pud.json b/src/unitxt/catalog/cards/universal_ner/en/pud.json index 20fd54cd5c..6e5e93e3bb 100644 --- a/src/unitxt/catalog/cards/universal_ner/en/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/en/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/hr/set.json b/src/unitxt/catalog/cards/universal_ner/hr/set.json index 987c793440..21dc447fc2 100644 --- a/src/unitxt/catalog/cards/universal_ner/hr/set.json +++ b/src/unitxt/catalog/cards/universal_ner/hr/set.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/pt/bosque.json b/src/unitxt/catalog/cards/universal_ner/pt/bosque.json index 215048e251..1ca227b4dd 100644 --- a/src/unitxt/catalog/cards/universal_ner/pt/bosque.json +++ b/src/unitxt/catalog/cards/universal_ner/pt/bosque.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/pt/pud.json b/src/unitxt/catalog/cards/universal_ner/pt/pud.json index 23d53d0127..48ff32e711 100644 --- a/src/unitxt/catalog/cards/universal_ner/pt/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/pt/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/ru/pud.json b/src/unitxt/catalog/cards/universal_ner/ru/pud.json index 38af9c908e..62b86ef085 100644 --- a/src/unitxt/catalog/cards/universal_ner/ru/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/ru/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/sk/snk.json b/src/unitxt/catalog/cards/universal_ner/sk/snk.json index 3f62621c97..7f8482e2ff 100644 --- a/src/unitxt/catalog/cards/universal_ner/sk/snk.json +++ b/src/unitxt/catalog/cards/universal_ner/sk/snk.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/sr/set.json b/src/unitxt/catalog/cards/universal_ner/sr/set.json index b9d0a465d6..baed550497 100644 --- a/src/unitxt/catalog/cards/universal_ner/sr/set.json +++ b/src/unitxt/catalog/cards/universal_ner/sr/set.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/sv/pud.json b/src/unitxt/catalog/cards/universal_ner/sv/pud.json index fb3819bea0..b1ce31186f 100644 --- a/src/unitxt/catalog/cards/universal_ner/sv/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/sv/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/sv/talbanken.json b/src/unitxt/catalog/cards/universal_ner/sv/talbanken.json index 379f561968..cb27a76f3a 100644 --- a/src/unitxt/catalog/cards/universal_ner/sv/talbanken.json +++ b/src/unitxt/catalog/cards/universal_ner/sv/talbanken.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/tl/trg.json b/src/unitxt/catalog/cards/universal_ner/tl/trg.json index ebf271614e..6018d73316 100644 --- a/src/unitxt/catalog/cards/universal_ner/tl/trg.json +++ b/src/unitxt/catalog/cards/universal_ner/tl/trg.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/tl/ugnayan.json b/src/unitxt/catalog/cards/universal_ner/tl/ugnayan.json index 464370c399..94f631d7bb 100644 --- a/src/unitxt/catalog/cards/universal_ner/tl/ugnayan.json +++ b/src/unitxt/catalog/cards/universal_ner/tl/ugnayan.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/zh/gsd.json b/src/unitxt/catalog/cards/universal_ner/zh/gsd.json index 9fd8dae571..11ae5588c4 100644 --- a/src/unitxt/catalog/cards/universal_ner/zh/gsd.json +++ b/src/unitxt/catalog/cards/universal_ner/zh/gsd.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/zh/gsdsimp.json b/src/unitxt/catalog/cards/universal_ner/zh/gsdsimp.json index f9dff8fd95..f7a0fef724 100644 --- a/src/unitxt/catalog/cards/universal_ner/zh/gsdsimp.json +++ b/src/unitxt/catalog/cards/universal_ner/zh/gsdsimp.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/cards/universal_ner/zh/pud.json b/src/unitxt/catalog/cards/universal_ner/zh/pud.json index 668ee1867b..c129648195 100644 --- a/src/unitxt/catalog/cards/universal_ner/zh/pud.json +++ b/src/unitxt/catalog/cards/universal_ner/zh/pud.json @@ -65,7 +65,7 @@ { "__type__": "set", "fields": { - "classes": [ + "entity_types": [ "Person", "Organization", "Location" diff --git a/src/unitxt/catalog/tasks/ner/all_entity_types.json b/src/unitxt/catalog/tasks/ner/all_entity_types.json index ae88b535eb..59f705c6c7 100644 --- a/src/unitxt/catalog/tasks/ner/all_entity_types.json +++ b/src/unitxt/catalog/tasks/ner/all_entity_types.json @@ -1,20 +1,5 @@ { - "__type__": "task", - "input_fields": { - "text": "str", - "entity_types": "List[str]" - }, - "reference_fields": { - "spans_starts": "List[int]", - "spans_ends": "List[int]", - "text": "str", - "labels": "List[str]" - }, - "prediction_type": "List[Tuple[str, str]]", - "metrics": [ - "metrics.ner" - ], - "augmentable_inputs": [ - "text" - ] + "__type__": "artifact_link", + "to": "tasks.span_labeling.extraction", + "__deprecated_msg__": null } diff --git a/src/unitxt/catalog/tasks/qa/with_context/abstractive.json b/src/unitxt/catalog/tasks/qa/with_context/abstractive.json index 6a80f9cb97..7d7861a349 100644 --- a/src/unitxt/catalog/tasks/qa/with_context/abstractive.json +++ b/src/unitxt/catalog/tasks/qa/with_context/abstractive.json @@ -1,5 +1,5 @@ { "__type__": "artifact_link", - "artifact_linked_to": "tasks.qa.with_context", + "to": "tasks.qa.with_context", "__deprecated_msg__": null } diff --git a/src/unitxt/catalog/tasks/qa/with_context/extractive.json b/src/unitxt/catalog/tasks/qa/with_context/extractive.json index 802717b3e8..6ba616fd70 100644 --- a/src/unitxt/catalog/tasks/qa/with_context/extractive.json +++ b/src/unitxt/catalog/tasks/qa/with_context/extractive.json @@ -1,5 +1,5 @@ { "__type__": "artifact_link", - "artifact_linked_to": "tasks.qa.extractive", + "to": "tasks.qa.extractive", "__deprecated_msg__": null } diff --git a/src/unitxt/catalog/tasks/span_labeling/extraction.json b/src/unitxt/catalog/tasks/span_labeling/extraction.json index 345e99c472..a1d1eb9c6b 100644 --- a/src/unitxt/catalog/tasks/span_labeling/extraction.json +++ b/src/unitxt/catalog/tasks/span_labeling/extraction.json @@ -1,10 +1,10 @@ { "__type__": "task", + "__description__": "This is Entity Extraction task where multiple entity types are to be extracted.\nThe input is the 'text' and 'entity_types' to extract (e.g. [\"Organization\", \"Location\", \"Person\"])\n\nBy default, classical f1 metric is used, which expects a list of pairs.\nMultiple f1 score are reported, including f1_micro and f1_macro and f1 per per entity_type.\".\nThe template's post processors must convert the model textual predictions into the expected list format.\n", "input_fields": { "text": "str", "text_type": "str", - "class_type": "str", - "classes": "List[str]" + "entity_types": "List[str]" }, "reference_fields": { "text": "str", @@ -20,7 +20,7 @@ "text" ], "defaults": { - "text_type": "text", - "class_type": "entity type" - } + "text_type": "text" + }, + "default_template": "templates.span_labeling.extraction.detailed" } diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/all.json b/src/unitxt/catalog/templates/span_labeling/extraction/all.json index 80be2a7c16..62f58affbc 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/all.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/all.json @@ -1,6 +1,7 @@ { "__type__": "templates_list", "items": [ + "templates.span_labeling.extraction.detailed", "templates.span_labeling.extraction.extract", "templates.span_labeling.extraction.having", "templates.span_labeling.extraction.carry", diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/carry.json b/src/unitxt/catalog/templates/span_labeling/extraction/carry.json index 7cbce653b5..0ef9eea536 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/carry.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/carry.json @@ -1,6 +1,6 @@ { "__type__": "span_labeling_template", - "input_format": "{text_type}: {text}\nFrom this {text_type}, extract entities that carry one of the following types: {classes}.", + "input_format": "{text_type}: {text}\nFrom this {text_type}, extract entities that carry one of the following types: {entity_types}.", "postprocessors": [ "processors.to_span_label_pairs" ] diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/detailed.json b/src/unitxt/catalog/templates/span_labeling/extraction/detailed.json new file mode 100644 index 0000000000..747d0284f2 --- /dev/null +++ b/src/unitxt/catalog/templates/span_labeling/extraction/detailed.json @@ -0,0 +1,8 @@ +{ + "__type__": "span_labeling_template", + "instruction": "From the given {text_type}, extract all the entities of the following entity types: {entity_types}.\nReturn the output in this exact format:\nThe output should be a comma separated list of pairs of entity and corresponding entity_type.\nUse a colon to separate between the entity and entity_type. ", + "input_format": "{text_type}:\n{text}", + "postprocessors": [ + "processors.to_span_label_pairs" + ] +} diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/extract.json b/src/unitxt/catalog/templates/span_labeling/extraction/extract.json index 86c016f536..b2fc40d73e 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/extract.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/extract.json @@ -1,7 +1,7 @@ { "__type__": "span_labeling_template", "input_format": "{text_type}: {text}", - "instruction": "From the following {text_type}, extract the objects for which the {class_type} expressed is one of {classes}.", + "instruction": "From the following {text_type}, extract the objects for which the entity type expressed is one of {entity_types}.", "postprocessors": [ "processors.to_span_label_pairs" ] diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/having.json b/src/unitxt/catalog/templates/span_labeling/extraction/having.json index 29c6c6b200..79b17eae99 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/having.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/having.json @@ -1,7 +1,7 @@ { "__type__": "span_labeling_template", "input_format": "{text_type}: {text}", - "instruction": "From the following {text_type}, extract spans having a {class_type}: {classes}.", + "instruction": "From the following {text_type}, extract spans having a entity type: {entity_types}.", "postprocessors": [ "processors.to_span_label_pairs" ] diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/identify.json b/src/unitxt/catalog/templates/span_labeling/extraction/identify.json index b056edd9ec..a8ff34da95 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/identify.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/identify.json @@ -1,7 +1,7 @@ { "__type__": "span_labeling_template", "input_format": "{text_type}: {text}", - "instruction": "From the following {text_type}, identify spans with {class_type}:{classes}.", + "instruction": "From the following {text_type}, identify spans with entity type:{entity_types}.", "postprocessors": [ "processors.to_span_label_pairs" ] diff --git a/src/unitxt/catalog/templates/span_labeling/extraction/title.json b/src/unitxt/catalog/templates/span_labeling/extraction/title.json index 22d4322687..54c6d82a61 100644 --- a/src/unitxt/catalog/templates/span_labeling/extraction/title.json +++ b/src/unitxt/catalog/templates/span_labeling/extraction/title.json @@ -1,13 +1,12 @@ { "__type__": "span_labeling_template", "input_format": "{text_type}:\n{text}", - "instruction": "From the following {text_type}, extract the objects for which the {class_type} expressed is one of {classes}.", - "target_prefix": "{class_type}:\n", + "instruction": "From the following {text_type}, extract the objects for which the entity type expressed is one of {entity_types}.", + "target_prefix": "entity type:\n", "postprocessors": [ "processors.to_span_label_pairs" ], "title_fields": [ - "text_type", - "class_type" + "text_type" ] } diff --git a/tests/library/test_artifact.py b/tests/library/test_artifact.py index 6f2ff280f0..ea7fabec3c 100644 --- a/tests/library/test_artifact.py +++ b/tests/library/test_artifact.py @@ -349,7 +349,7 @@ def test_artifact_link_with_deprecation_warning(self): with self.assertWarns(DeprecationWarning): rename_fields = ArtifactLink( - artifact_linked_to="rename.for.test.artifact.link", + to="rename.for.test.artifact.link", __deprecated_msg__="Artifact is deprecated. " "'rename.for.test.artifact.link' is now instantiated instead. " "\nIn the future, please use 'rename.for.test.artifact.link'.", @@ -536,7 +536,7 @@ def test_artifact_link_in_recursive_load(self): ) link_to_copy_operator = ArtifactLink( - artifact_linked_to="copy.operator", + to="copy.operator", ) add_to_catalog( link_to_copy_operator,