copyright | lastupdated | subcollection | ||
---|---|---|---|---|
|
2025-02-13 |
discovery-data |
{{site.data.keyword.attribute-definition-list}}
{: #migrate-wks}
Use custom models and other resources that you created in {{site.data.keyword.knowledgestudioshort}} by migrating them to {{site.data.keyword.discoveryshort}}. {: shortdesc}
{: #migrate-wks-output}
To start using your {{site.data.keyword.knowledgestudioshort}} model immediately, export the model from {{site.data.keyword.knowledgestudioshort}} and import it to {{site.data.keyword.discoveryshort}} as a machine learning enrichment.
When you import a {{site.data.keyword.knowledgestudioshort}} model to use as is in {{site.data.keyword.discoveryshort}}, root-level entity types that were defined in the model can be recognized when they occur in your documents. Any mentions of entity subtypes that occur are identified as mentions of the parent entity type. The subtype entities themselves are not preserved. If you want the model to continue to distinguish between different subtypes of an entity, you must take extra steps. For more information, see Retaining subtype information.
You cannot continue to update a model that you import as an ML enrichment. {: note}
The following types of models can be imported and used as is:
- Rule-based models created in {{site.data.keyword.knowledgestudioshort}} that find entities in documents based on rules that you define. (File format: .pear)
- Machine learning models created in {{site.data.keyword.knowledgestudioshort}} that understand the linguistic nuances, meaning, and relationships specific to your industry (file format: .zip)
The models that you can add depend on your deployment type:
- [IBM Cloud]{: tag-ibm-cloud} You can add models that were created with a {{site.data.keyword.knowledgestudiofull}} instance that is hosted in {{site.data.keyword.cloud_notm}} only.
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} You can add models that were created with an instance of {{site.data.keyword.knowledgestudiofull}} that is hosted on {{site.data.keyword.icp4dfull}} or {{site.data.keyword.cloud_notm}}.
For more information, see Using imported ML models to find custom terms.
{: #migrate-wks-corpus}
{{site.data.keyword.discoveryshort}} has an entity extractor tool that you can use to define a type system. The entity extractor user interface is similar to the {{site.data.keyword.knowledgestudioshort}} user interface that is used to annotate documents that you add to corpus for a machine learning model. However, in {{site.data.keyword.knowledgestudioshort}}, you define root-level entities only, not subtypes or relationships.
As an alternative to importing a {{site.data.keyword.knowledgestudioshort}} model as is and applying it as an enrichment, you can also import a {{site.data.keyword.knowledgestudioshort}} corpus. When you add a {{site.data.keyword.knowledgestudioshort}} corpus to the {{site.data.keyword.discoveryshort}} entity extractor tool, any root-level entities from the corpus are represented as new entities in the {{site.data.keyword.discoveryshort}} entity extractor workspace. Entity subtypes are not recognized. Although, you can take extra steps to retain subtype information.
Relations and coreferences from the {{site.data.keyword.knowledgestudioshort}} machine learning model are not represented, neither are any custom dictionaries that are associated with the model.
Things to consider when choosing whether to import a model or import a corpus:
- You can continue to edit the type system when you import the corpus. When you import a trained model, you cannot subsequently edit it in {{site.data.keyword.discoveryshort}}.
- An imported model that you apply to a collection as an enrichment can recognize any entity subtype, relation, and coreference information that the original model was trained to recognize in addition to root-level entities. An entity extractor enrichment can find and tag entities only.
For more information, see Importing a {{site.data.keyword.knowledgestudioshort}} corpus.
{: #migrate-wks-subtypes}
When you import a {{site.data.keyword.knowledgestudioshort}} model to {{site.data.keyword.discoveryshort}}, any subtypes that were defined in the model are identified as mentions of the parent entity type. The subtype entities themselves are not preserved. To retain the subtype information, you must flatten your type system by converting entity subtypes into new root-level entity types.
Follow these steps only if you are sure that the subtype distinctions add significant value to the model. In many use cases, using the root-level entity types is sufficient.
You cannot use this procedure to retain subtypes if any of the documents in your corpus were pre-annotated with the Natural Language Understanding service. Make sure that your flattened type system doesn't surpass the allowed number of entity types for your plan. For more information, see Entity extractor limits. {: important}
For example, your model might have entity types with the following hierarchy:
APPLIANCES
FURNITURE
PATIO
LIVING
DINING
{: codeblock}
A flattened version of the type sytem looks like this:
APPLIANCES
FURNITURE_NONE
FURNITURE_PATIO
FURNITURE_LIVING
FURNITURE_DINING
{: codeblock}
A useful approach for flattening the type system involves the following changes:
- Add the parent entity type label (
FURNITURE
) as a prefix to the label of each child subtype to produce a new root-level entity that preserves the hierarchical relationship in its label. For example,FURNITURE_PATIO
,FURNITURE_LIVING
, andFURNITURE_DINING
. - Append the word NONE to the parent root-level entity label to identify it as the parent. For example,
FURNITURE_NONE
. - Leave the labels of entity types that don't have subtypes unchanged. For example, the label
APPLIANCES
doesn't change.
To retain entity subtype information, complete the following steps:
-
Ensure that the annotation and training of the {{site.data.keyword.knowledgestudioshort}} model is completed and the model is ready to be deployed.
-
Export the type system that was used to annotate the documents in your corpus from {{site.data.keyword.knowledgestudioshort}} as a .json file.
Follow the appropriate steps for exporting based on your {{site.data.keyword.knowledgestudioshort}} deployment type:
- [IBM Cloud]{: tag-ibm-cloud} Uploading resources from another workspace{: external}
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} Uploading resources from another workspace{: external}
-
Modify the type system JSON file. For each subtype, add a new root-level entity type.
For example, the original type system might contain the following types:
{ "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE", "sireProp":{ "mentionType":null, "subtypes":["PATIO","LIVING","DINING"], "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "color":"black", "hotkey":"m", "backGroundColor":"#00FFFF", "active":true, "roleOnly":false}, "creationDate":1610611788484, "source":null, "modifiedDate":0, "typeType":null, "typeClass":null, "typeVersion":null, "typeDesc":null, "typeSuperType":null, "typeSuperTypeId":null, "typeCreateDate":null, "typeUpdateDate":null, "typeProvenance":null, "alchemyAPITypes":null, "nluAPITypes":null},
{: codeblock}
To convert the subtypes to new root-level types, make the following change:
{ "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_NONE", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_PATIO", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_LIVING", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_DINING", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } },
{: codeblock}
-
Assign a unique ID to each new root-level entity type.
-
Export the corpus for your machine learning model from {{site.data.keyword.knowledgestudioshort}} as a compressed file.
Follow the appropriate steps for exporting based on your {{site.data.keyword.knowledgestudioshort}} deployment type:
- [IBM Cloud]{: tag-ibm-cloud} Uploading resources from another workspace{: external}
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} Uploading resources from another workspace{: external}
-
In the downloaded corpus, for all mentions with a subtype defined, update the type information for the mention to specify the new root-level entity type.
For example, the original type system might include the
PATIO
subtype mention:{ "id" : "Blogs_shopper.com_dc5cf4764d91f87575b17ac8a5268462.en-M92", "source" : "IMPORT", "properties" : { "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "PATIO", "SIRE_MENTION_ROLE" : "FURNITURE", "SIRE_MENTION_TYPE" : "NONE" }, "type" : "FURNITURE", "begin" : 3221, "end" : 3234, "inCoref" : false },
{: codeblock}
Replace the value of the
SIRE_MENTION_ROLE
andtype
for the mention with the new root-level entity label, such asFURNITURE_PATIO
. SpecifyNONE
as theSIRE_ENTITY_SUBTYPE
value.{ "id" : "Blogs_shopper.com_dc5cf4764d91f87575b17ac8a5268462.en-M92", "source" : "IMPORT", "properties" : { "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "NONE", "SIRE_MENTION_ROLE" : "FURNITURE_PATIO", "SIRE_MENTION_TYPE" : "NONE" }, "type" : "FURNITURE_PATIO", "begin" : 3221, "end" : 3234, "inCoref" : false },
{: codeblock}
Don't forget to rename the parent mention labels.
For example, find mentions that specify
"SIRE_ENTITY_SUBTYPE" : "OTHER"
, and then change the value fromOTHER
toNONE
.Change the value of the
SIRE_MENTION_ROLE
andtype
for the mention to the new parent entity type label.For example, change the
SIRE_MENTION_ROLE
andtype
values for these mentions fromFURNITURE
toFURNITURE_NONE
, and theSIRE_ENTITY_SUBTYPE
toNONE
.{ "id" : "Sports_herald.com_be99aca94a7cff5abb74476b844a11b6.en-M75", "source" : "IMPORT", "properties" : { "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "NONE", "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_TYPE" : "NONE", "SIRE_MENTION_ROLE" : "FURNITURE_NONE" }, "type" : "FURNITURE_NONE", "begin" : 2063, "end" : 2071, "inCoref" : false },
{: codeblock}
-
Add annotations for relationships that are missing based on the new flattened entity types.
-
Create a {{site.data.keyword.knowledgestudioshort}} workspace, and then upload the converted type system.
Follow the appropriate steps for uploading a type system based on your {{site.data.keyword.knowledgestudioshort}} deployment type:
- [IBM Cloud]{: tag-ibm-cloud} Adding a type system to the workspace{: external}
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} Adding a type system to the workspace{: external}
-
Upload the annotated documents to the workspace. Retain the original file structure of the exported data. Ensure that the compressed file has the same root-level directory as the original exported file, for example.
Follow the appropriate steps for uploading documents based on your {{site.data.keyword.knowledgestudioshort}} deployment type:
- [IBM Cloud]{: tag-ibm-cloud} Adding documents to a workspace{: external}
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} Adding documents to a workspace{: external}
-
From {{site.data.keyword.knowledgestudioshort}}, click Train to retrain the model.
For more information, see the appropriate topic for your deployment type:
- [IBM Cloud]{: tag-ibm-cloud} Training the machine learning model{: external}
- [IBM Cloud Pak for Data]{: tag-cp4d} [IBM Software Hub]{: tag-teal} Training the machine learning model{: external}
-
Now, you're ready to export the model from {{site.data.keyword.knowledgestudioshort}} and import it to {{site.data.keyword.discoveryshort}} to use the model as a machine learning enrichment.
For more information, see Using imported ML models to find custom terms.