From 12427dae197f0ad8049fd7fb8a6bfe5af82785ee Mon Sep 17 00:00:00 2001 From: "Cabir C." <64752006+Cabir40@users.noreply.github.com> Date: Mon, 7 Oct 2024 13:10:40 +0300 Subject: [PATCH] Models hub internal (#1532) --- ...linical_deidentification_docwise_wip_en.md | 138 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 139 +++++++++++ ...-03-clinical_deidentification_v2_wip_en.md | 136 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 140 +++++++++++ ...9-27-explain_clinical_doc_sdoh_small_en.md | 198 ++++++++++++++++ ...linical_deidentification_docwise_wip_en.md | 138 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 139 +++++++++++ ...-03-clinical_deidentification_v2_wip_en.md | 136 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 140 +++++++++++ ...-24-clinical_deidentification_v2_wip_en.md | 145 ++++++++++++ ...linical_deidentification_docwise_wip_en.md | 139 +++++++++++ .../akrztrk/2024-09-25-ner_deid_aipii_en.md | 172 ++++++++++++++ ...-27-clinical_deidentification_v2_wip_en.md | 145 ++++++++++++ ...9-27-explain_clinical_doc_sdoh_small_en.md | 199 ++++++++++++++++ ...linical_deidentification_docwise_wip_en.md | 147 ++++++++++++ ...-30-clinical_deidentification_v2_wip_en.md | 145 ++++++++++++ .../akrztrk/2024-10-01-jsl_medm_q8_v1_en.md | 217 +++++++++++++++++ .../2024-10-01-jsl_meds_ner_q4_v2_en.md | 160 +++++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 123 ++++++++++ ...linical_deidentification_docwise_wip_en.md | 137 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 134 +++++++++++ ...-03-clinical_deidentification_v2_wip_en.md | 136 +++++++++++ ...al_deidentification_nameAugmented_v2_en.md | 140 +++++++++++ .../akrztrk/2024-10-04-jsl_medm_q4_v1_en.md | 218 ++++++++++++++++++ .../2024-10-04-jsl_meds_ner_q16_v2_en.md | 161 +++++++++++++ .../2024-10-04-jsl_meds_ner_q8_v2_en.md | 161 +++++++++++++ .../2024-10-04-jsl_meds_ner_zs_q16_v1_en.md | 161 +++++++++++++ .../2024-10-04-jsl_meds_ner_zs_q4_v1_en.md | 161 +++++++++++++ .../2024-10-04-jsl_meds_ner_zs_q8_v1_en.md | 161 +++++++++++++ .../akrztrk/2024-10-05-jsl_meds_q16_v1_en.md | 140 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q16_v2_en.md | 132 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q16_v3_en.md | 132 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q4_v1_en.md | 140 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q4_v2_en.md | 122 ++++++++++ .../akrztrk/2024-10-05-jsl_meds_q4_v3_en.md | 140 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q8_v1_en.md | 140 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q8_v2_en.md | 140 +++++++++++ .../akrztrk/2024-10-05-jsl_meds_q8_v3_en.md | 132 +++++++++++ .../2024-10-05-jsl_meds_rag_q16_v1_en.md | 143 ++++++++++++ .../2024-10-05-jsl_meds_rag_q4_v1_en.md | 143 ++++++++++++ .../2024-10-05-jsl_meds_rag_q8_v1_en.md | 143 ++++++++++++ .../akrztrk/2024-10-06-jsl_medm_q4_v2_en.md | 134 +++++++++++ .../akrztrk/2024-10-06-jsl_medm_q4_v3_en.md | 139 +++++++++++ .../akrztrk/2024-10-06-jsl_medm_q8_v2_en.md | 134 +++++++++++ ...9-20-ner_deid_subentity_augmented_v2_en.md | 181 +++++++++++++++ ...9-27-explain_clinical_doc_sdoh_small_en.md | 198 ++++++++++++++++ ...-02-icd10cm_chronic_indicator_mapper_en.md | 214 +++++++++++++++++ .../yigitgull/2024-09-29-email_matcher_en.md | 108 +++++++++ 48 files changed, 7221 insertions(+) create mode 100644 docs/_posts/Cabir40/2024-10-03-clinical_deidentification_docwise_wip_en.md create mode 100644 docs/_posts/Cabir40/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/Cabir40/2024-10-03-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/Cabir40/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/Meryem1425/2024-09-27-explain_clinical_doc_sdoh_small_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_docwise_wip_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-09-24-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-09-25-clinical_deidentification_docwise_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-09-25-ner_deid_aipii_en.md create mode 100644 docs/_posts/akrztrk/2024-09-27-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-09-27-explain_clinical_doc_sdoh_small_en.md create mode 100644 docs/_posts/akrztrk/2024-09-30-clinical_deidentification_docwise_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-09-30-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-10-01-jsl_medm_q8_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-01-jsl_meds_ner_q4_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-02-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-03-clinical_deidentification_docwise_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-03-clinical_deidentification_v2_wip_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_medm_q4_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q16_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q8_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q16_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q4_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q8_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v3_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v3_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v3_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q16_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q4_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q8_v1_en.md create mode 100644 docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v3_en.md create mode 100644 docs/_posts/akrztrk/2024-10-06-jsl_medm_q8_v2_en.md create mode 100644 docs/_posts/bugeki/2024-09-20-ner_deid_subentity_augmented_v2_en.md create mode 100644 docs/_posts/bugeki/2024-09-27-explain_clinical_doc_sdoh_small_en.md create mode 100644 docs/_posts/bugeki/2024-10-02-icd10cm_chronic_indicator_mapper_en.md create mode 100644 docs/_posts/yigitgull/2024-09-29-email_matcher_en.md diff --git a/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_docwise_wip_en.md b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_docwise_wip_en.md new file mode 100644 index 0000000000..a1ce567dfc --- /dev/null +++ b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_docwise_wip_en.md @@ -0,0 +1,138 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Document Wise) +author: John Snow Labs +name: clinical_deidentification_docwise_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, docwise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, +`LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, +`SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, +`STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.4_1727967134186.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.4_1727967134186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.result for i in deid_result['mask_entity']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("mask_entity").map(_("result").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is +patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Edwardo Graft, from MCBRIDE ORTHOPEDIC HOSPITAL in CLAMART, attended to the patient on 14/06/2024. +The patient’s medical record number is 78295621. +The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_docwise_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- InternalDocumentSplitter +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- MedicalNerModel +- MedicalNerModel +- NerConverterInternalModel +- NerConverterInternalModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- TextMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- LightDeIdentification +- LightDeIdentification diff --git a/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..6bc16848ce --- /dev/null +++ b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,139 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, +`STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.4_1727968688342.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.4_1727968688342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_v2_wip_en.md b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..2d6990ce58 --- /dev/null +++ b/docs/_posts/Cabir40/2024-10-03-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,136 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `NAME`, +`ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, +`STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.4_1727970664823.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.4_1727970664823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Alissa Irving, from KINDRED HOSPITAL SEATTLE in Geleen, attended to the patient on 22/06/2024. +The patient’s medical record number is 16109604. +The patient, Burnette Carte, is 49 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Cabir40/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/Cabir40/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..6ab8511096 --- /dev/null +++ b/docs/_posts/Cabir40/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-04 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.4_1728046249043.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.4_1728046249043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Meryem1425/2024-09-27-explain_clinical_doc_sdoh_small_en.md b/docs/_posts/Meryem1425/2024-09-27-explain_clinical_doc_sdoh_small_en.md new file mode 100644 index 0000000000..c7d71a65f3 --- /dev/null +++ b/docs/_posts/Meryem1425/2024-09-27-explain_clinical_doc_sdoh_small_en.md @@ -0,0 +1,198 @@ +--- +layout: model +title: Explain Clinical Document - Social Determinants of Health (SDOH)-Small +author: John Snow Labs +name: explain_clinical_doc_sdoh_small +date: 2024-09-27 +tags: [en, licensed, clinical, pipeline, social_determinants, sdoh, ner, assertion, relation_extraction] +task: Pipeline Healthcare +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline is designed to + +- extract all social determinants of health (SDOH) entities from text, + +- assign assertion status to the extracted entities, + +- establish relations between the extracted entities. + +In this pipeline, [ner_sdoh](https://nlp.johnsnowlabs.com/2023/06/13/ner_sdoh_en.html) +NER model, [assertion_sdoh_wip](https://nlp.johnsnowlabs.com/2023/08/13/assertion_sdoh_wip_en.html) assertion model and [generic_re](https://nlp.johnsnowlabs.com/2022/12/20/generic_re.html) +relation extraction model were used to achieve those tasks. + +Clinical Entity Labels: + +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + + +Assertion Status Labels: + +`Present`, `Absent`, `Possible`, `Past`, `Hypotetical`, `Someone_Else` + +Relation Extraction Labels: + +`Access_To_Care-Financial_Status`, `Access_To_Care–Income`, `Access_To_Care-Social_Support`, `Access_To_Care-Substance_Use`, `Alcohol-Mental_Health`, `Alcohol-Quality_Of_Life`, `Alcohol–Smoking`, `Alcohol-Substance_Use`, `Alcohol-Violence_Or_Abuse`, `Childhood_Event-Violence_Or_Abuse`, `Community_Safety-Quality_Of_Life`, `Community_Safety-Violence_Or_Abuse`, `Diet-Eating_Disorder`, `Diet–Exercise`, `Diet–Gender`, `Diet–Obesity`, `Disability-Insurance_Status`, `Disability-Mental_Health`, `Disability-Quality_Of_Life`, `Disability-Social_Exclusion`, `Eating_Disorder-Food_Insecurity`, `Eating_Disorder-Mental_Health`, `Eating_Disorder–Obesity`, `Education–Employment`, `Education-Financial_Status`, `Education–Income`, `Education-Legal_Issues`, `Education-Quality_Of_Life`, `Education-Substance_Use`, `Employment-Financial_Status`, `Employment–Income`, `Employment-Insurance_Status`, `Employment-Quality_Of_Life`, `Environmental_Condition-Quality_Of_Life`, `Exercise-Mental_Health`, `Exercise–Obesity`, `Exercise-Quality_Of_Life`, `Exercise–Smoking`, `Exercise-Substance_Use`, `Financial_Status-Food_Insecurity`, `Financial_Status-Housing`, `Financial_Status-Income`, `Financial_Status-Insurance_Status`, `Financial_Status-Mental_Health`, `Financial_Status-Quality_Of_Life`, `Financial_Status-Social_Support`, `Food_Insecurity-Income`, `Food_Insecurity-Mental_Health`, `Food_Insecurity-Quality_Of_Life`, `Housing-Income`, `Housing-Insurance_Status`, `Housing-Quality_Of_Life`, `Income-Insurance_Status`, `Income-Quality_Of_Life`, `Language-Population_Group`, `Language-Race_Ethnicity`, `Language-Social_Exclusion`, `Legal_Issues-Race_Ethnicity`, `Legal_Issues-Substance_Use`, `Legal_Issues-Violence_Or_Abuse`, `Marital_Status-Mental_Health`, `Marital_Status-Violence_Or_Abuse`, `Mental_Health-Obesity`, `Mental_Health-Quality_Of_Life`, `Mental_Health-Smoking`, `Mental_Health-Social_Exclusion`, `Mental_Health-Social_Support`, `Mental_Health-Substance_Use`, `Mental_Health-Violence_Or_Abuse`, `Obesity-Quality_Of_Life`, `Population_Group-Violence_Or_Abuse`, `Quality_Of_Life-Substance_Use`, `Race_Ethnicity-Social_Exclusion`, `Race_Ethnicity-Social_Support`, `Race_Ethnicity-Violence_Or_Abuse`, `Sexual_Activity-Sexual_Orientation`, `Sexual_Orientation-Social_Exclusion`, `Sexual_Orientation-Substance_Use`, `Sexual_Orientation-Violence_Or_Abuse`, `Smoking-Substance_Use`, `Social_Exclusion-Substance_Use`, `Substance_Duration-Substance_Use`, `Substance_Frequency-Substance_Use`, `Substance_Quantity-Substance_Use`, `Substance_Use-Violence_Or_Abuse`, `Substance_Use-Communicable_Disease`, `Alcohol-Obesity` + +## Predicted Entities +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.2_1727459020083.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.2_1727459020083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +sdoh_pipeline = PretrainedPipeline('explain_clinical_doc_sdoh_small', 'en', 'clinical/models') + +result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val sdoh_pipeline = new PretrainedPipeline("explain_clinical_doc_sdoh_small", "en", "clinical/models") + +val result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +
+ +## Results + +```bash +# NER_Result + +| | chunks | begin | end | sentence_id | entities | confidence | +|---:|:--------------------------------|--------:|------:|--------------:|:------------------|-------------:| +| 0 | anxiety | 47 | 53 | 0 | Mental_Health | 0.9897 | +| 1 | depression | 59 | 68 | 0 | Mental_Health | 0.9938 | +| 2 | his | 97 | 99 | 0 | Gender | 0.992 | +| 3 | quality of life | 101 | 115 | 0 | Quality_Of_Life | 0.6252 | +| 4 | He | 118 | 119 | 1 | Gender | 0.9996 | +| 5 | childhood trauma | 143 | 158 | 1 | Chidhood_Event | 0.7466 | +| 6 | violence | 171 | 178 | 1 | Violence_Or_Abuse | 0.5394 | +| 7 | abuse | 184 | 188 | 1 | Violence_Or_Abuse | 0.6209 | +| 8 | his | 193 | 195 | 1 | Gender | 0.9536 | +| 9 | his | 233 | 235 | 1 | Gender | 0.9772 | +| 10 | smoking | 237 | 243 | 1 | Smoking | 0.9858 | +| 11 | alcohol use | 246 | 256 | 1 | Alcohol | 0.68065 | +| 12 | mental health struggles | 270 | 292 | 1 | Mental_Health | 0.248033 | +| 13 | He | 295 | 296 | 2 | Gender | 0.9995 | +| 14 | substance use | 316 | 328 | 2 | Substance_Use | 0.6921 | +| 15 | sexual activity | 333 | 347 | 2 | Sexual_Activity | 0.62915 | +| 16 | monogamous | 368 | 377 | 2 | Sexual_Activity | 0.6915 | +| 17 | his | 382 | 384 | 2 | Gender | 0.9883 | +| 18 | his | 404 | 406 | 2 | Gender | 0.978 | +| 19 | wife | 408 | 411 | 2 | Family_Member | 0.9833 | +| 20 | immigrant | 432 | 440 | 3 | Population_Group | 0.9974 | +| 21 | English | 453 | 459 | 3 | Language | 0.9979 | +| 22 | He | 483 | 484 | 4 | Gender | 0.9996 | +| 23 | difficulty accessing healthcare | 495 | 525 | 4 | Access_To_Care | 0.3998 | +| 24 | medical insurance | 542 | 558 | 4 | Insurance_Status | 0.6721 | +| 25 | He | 561 | 562 | 5 | Gender | 0.9996 | +| 26 | herniated disc | 570 | 583 | 5 | Other_Disease | 0.71515 | +| 27 | hypertension | 586 | 597 | 5 | Hypertension | 0.9984 | +| 28 | coronary artery disease | 600 | 622 | 5 | Other_Disease | 0.847933 | +| 29 | CAD | 625 | 627 | 5 | Other_Disease | 0.9884 | +| 30 | diabetes mellitus | 634 | 650 | 5 | Other_Disease | 0.81115 | +| 31 | manic disorder | 671 | 684 | 6 | Mental_Health | 0.7929 | +| 32 | psychotic | 700 | 708 | 6 | Mental_Health | 0.9743 | +| 33 | impulsive behavior | 720 | 737 | 6 | Mental_Health | 0.41135 | +| 34 | He | 740 | 741 | 7 | Gender | 0.9996 | +| 35 | disabled | 752 | 759 | 7 | Disability | 0.9999 | + +# Assertıon_Result: + +| | chunks | entities | assertion | +|---:|:--------------------------------|:------------------|:------------| +| 0 | anxiety | Mental_Health | Present | +| 1 | depression | Mental_Health | Present | +| 2 | quality of life | Quality_Of_Life | Present | +| 3 | violence | Violence_Or_Abuse | Past | +| 4 | abuse | Violence_Or_Abuse | Past | +| 5 | smoking | Smoking | Present | +| 6 | alcohol use | Alcohol | Present | +| 7 | mental health struggles | Mental_Health | Present | +| 8 | substance use | Substance_Use | Absent | +| 9 | sexual activity | Sexual_Activity | Present | +| 10 | monogamous | Sexual_Activity | Absent | +| 11 | difficulty accessing healthcare | Access_To_Care | Absent | +| 12 | medical insurance | Insurance_Status | Present | +| 13 | hypertension | Hypertension | Present | +| 14 | manic disorder | Mental_Health | Present | +| 15 | psychotic | Mental_Health | Present | +| 16 | impulsive behavior | Mental_Health | Present | + + +# RE Result + +| | sentence | entity1_begin | entity1_end | chunk1 | entity1 | entity2_begin | entity2_end | chunk2 | entity2 | relation | confidence | +|---:|-----------:|----------------:|--------------:|:------------|:------------------|----------------:|--------------:|:------------------------|:----------------|:--------------------------------|-------------:| +| 0 | 0 | 47 | 53 | anxiety | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 1 | 0 | 59 | 68 | depression | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 2 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 3 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 4 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 5 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 6 | 1 | 237 | 243 | smoking | Smoking | 270 | 292 | mental health struggles | Mental_Health | Smoking-Mental_Health | 1 | +| 7 | 1 | 246 | 256 | alcohol use | Alcohol | 270 | 292 | mental health struggles | Mental_Health | Alcohol-Mental_Health | 1 | +| 8 | 3 | 432 | 440 | immigrant | Population_Group | 453 | 459 | English | Language | Population_Group-Language | 1 | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|explain_clinical_doc_sdoh_small| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- AssertionDLModel +- PerceptronModel +- DependencyParserModel +- GenericREModel diff --git a/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_docwise_wip_en.md b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_docwise_wip_en.md new file mode 100644 index 0000000000..629cb50a67 --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_docwise_wip_en.md @@ -0,0 +1,138 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Document Wise) +author: John Snow Labs +name: clinical_deidentification_docwise_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, docwise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, +`LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, +`SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.2_1727968140333.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.2_1727968140333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.result for i in deid_result['mask_entity']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("mask_entity").map(_("result").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is +patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Edwardo Graft, from MCBRIDE ORTHOPEDIC HOSPITAL in CLAMART, attended to the patient on 14/06/2024. +The patient’s medical record number is 78295621. +The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_docwise_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- InternalDocumentSplitter +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- MedicalNerModel +- MedicalNerModel +- NerConverterInternalModel +- NerConverterInternalModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- TextMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- LightDeIdentification +- LightDeIdentification diff --git a/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..a464346831 --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,139 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.2_1727969743921.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.2_1727969743921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_v2_wip_en.md b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..b14088cdfe --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-03-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,136 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `NAME`, +`ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.2_1727971152989.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.2_1727971152989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Alissa Irving, from KINDRED HOSPITAL SEATTLE in Geleen, attended to the patient on 22/06/2024. +The patient’s medical record number is 16109604. +The patient, Burnette Carte, is 49 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Meryem1425/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/Meryem1425/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..9babdf058c --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-04 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.2_1728047218925.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.2_1728047218925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-09-24-clinical_deidentification_v2_wip_en.md b/docs/_posts/akrztrk/2024-09-24-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..c2fb71b059 --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-24-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,145 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-09-24 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727218725743.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727218725743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) + + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) + + +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Name : , Record date: , MR: . +ID: , Dr. , IP . +He is a -year-old male was admitted to the for cystectomy on . +Patient's VIN : , SSN , Driver's license no: . +Phone , , , . +E-MAIL: . + +Obfuscated +------------------------------ +Name : Axel Bohr, Record date: 2093-02-01, MR: 61443154. +ID: #00867619, Dr. Rickard Charles, IP 002.002.002.002. +He is a 73-year-old male was admitted to the LOMA LINDA UNIVERSITY MEDICAL CENTER-MURRIETA for cystectomy on 02/01/93. +Patient's VIN : 5KDTO67TIWP809983, SSN #382-50-5397, Driver's license no: Q734193X. +Phone (902) 409-7353, 1555 Long Pond Road, Pomeroy, Maryland 29924. + E-MAIL: Halit@google.com. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-09-25-clinical_deidentification_docwise_wip_en.md b/docs/_posts/akrztrk/2024-09-25-clinical_deidentification_docwise_wip_en.md new file mode 100644 index 0000000000..3237f84d4d --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-25-clinical_deidentification_docwise_wip_en.md @@ -0,0 +1,139 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_docwise_wip +date: 2024-09-25 +tags: [deidentification, en, deid, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.4.1_3.4_1727248183384.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.4.1_3.4_1727248183384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Name : , Record date: , MR: . +ID: , Dr. , IP . +He is a -year-old male was admitted to the for cystectomy on . +Patient's VIN : , SSN , Driver's license no: . +Phone , , , . +E-MAIL: . + +Obfuscated +------------------------------ +Name : Axel Bohr, Record date: 2093-02-01, MR: 61443154. +ID: #00867619, Dr. Rickard Charles, IP 002.002.002.002. +He is a 73-year-old male was admitted to the LOMA LINDA UNIVERSITY MEDICAL CENTER-MURRIETA for cystectomy on 02/01/93. +Patient's VIN : 5KDTO67TIWP809983, SSN #382-50-5397, Driver's license no: Q734193X. +Phone (902) 409-7353, 1555 Long Pond Road, Pomeroy, Maryland 29924. + E-MAIL: Halit@google.com. +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_docwise_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- InternalDocumentSplitter +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- MedicalNerModel +- MedicalNerModel +- NerConverterInternalModel +- NerConverterInternalModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- TextMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- LightDeIdentification +- LightDeIdentification diff --git a/docs/_posts/akrztrk/2024-09-25-ner_deid_aipii_en.md b/docs/_posts/akrztrk/2024-09-25-ner_deid_aipii_en.md new file mode 100644 index 0000000000..503880a869 --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-25-ner_deid_aipii_en.md @@ -0,0 +1,172 @@ +--- +layout: model +title: Detect PHI for Deidentification (ai4privacy/pii-masking-400k) +author: John Snow Labs +name: ner_deid_aipii +date: 2024-09-25 +tags: [deid, clinical, en, licensed, ner] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This Named Entity Recognition (NER) annotator is trained using the `ai4privacy/pii-masking-400k` dataset. It leverages a deep learning architecture (Char CNNs - BiLSTM - CRF - word embeddings), inspired by the state-of-the-art model from Chiu & Nichols in their work "Named Entity Recognition with Bidirectional LSTM-CNN". This model is particularly effective in identifying and labeling various entities, making it useful for detecting protected health information (PHI) that may need to be masked or de-identified. + +## Predicted Entities + +`LICENSE`, `SSN`, `ZIP`, `NAME`, `PHONE`, `CITY`, `EMAIL`, `DATE`, `IDNUM`, `STREET`, `ACCOUNT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_deid_aipii_en_5.4.1_3.0_1727266249887.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_deid_aipii_en_5.4.1_3.0_1727266249887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +clinical_embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained('ner_deid_aipii', "en", "clinical/models")\ + .setInputCols(["sentence", "token","embeddings"])\ + .setOutputCol("ner") + +ner_converter = NerConverterInternal()\ + .setInputCols(['sentence', 'token', 'ner'])\ + .setOutputCol('ner_chunk') + +pipeline = Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter + ]) + +sample_texts = [""" +Ora Hendrickson, is 50 years old, Patient's ID no: 3454362A, SSN: 333-44-6666, +Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: ora@gmail.com. +"""] + +data = spark.createDataFrame(sample_texts, StringType()).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_deid_aipii", "en", "clinical/models") + .setInputCols(Array("sentence", "token","embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + sentenceDetector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter +)) + +val sample_texts = Seq("""Ora Hendrickson, is 50 years old, Patient's ID no: 3454362A, SSN: 333-44-6666, +Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: ora@gmail.com.""").toDF("text") + +val result = pipeline.fit(sample_texts).transform(sample_texts) +``` +
+ +## Results + +```bash ++-----------------+-----+---+---------+ +|chunk |begin|end|ner_label| ++-----------------+-----+---+---------+ +|Ora Hendrickson |2 |16 |NAME | +|3454362A |54 |61 |IDNUM | +|333-44-6666 |69 |79 |SSN | +|(302) 786-5227 |88 |101|PHONE | +|0295 Keats Street|104 |120|STREET | +|San Francisco |123 |135|CITY | +|ora@gmail.com |146 |158|EMAIL | ++-----------------+-----+---+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_deid_aipii| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.9 MB| + +## Benchmarking + +```bash + label precision recall f1-score support + ACCOUNT 0.86 0.59 0.70 867 + CITY 0.95 0.94 0.94 2735 + DATE 0.91 0.66 0.77 1408 + EMAIL 1.00 1.00 1.00 1469 + IDNUM 0.87 0.87 0.87 2763 + LICENSE 0.95 0.93 0.94 691 + NAME 0.96 0.97 0.97 6071 + PHONE 0.99 0.99 0.99 2182 + SSN 0.83 0.90 0.86 914 + STREET 0.93 0.91 0.92 2882 + ZIP 0.91 0.98 0.94 1271 + micro-avg 0.94 0.91 0.92 23253 + macro-avg 0.92 0.89 0.90 23253 +weighted-avg 0.93 0.91 0.92 23253 +``` diff --git a/docs/_posts/akrztrk/2024-09-27-clinical_deidentification_v2_wip_en.md b/docs/_posts/akrztrk/2024-09-27-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..43afbf404f --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-27-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,145 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-09-27 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727442868957.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727442868957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) + + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) + + +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Name : , Record date: , MR: . +ID: , Dr. , IP . +He is a -year-old male was admitted to the for cystectomy on . +Patient's VIN : , SSN , Driver's license no: . +Phone , , , . +E-MAIL: . + +Obfuscated +------------------------------ +Name : Axel Bohr, Record date: 2093-02-01, MR: 61443154. +ID: #00867619, Dr. Rickard Charles, IP 002.002.002.002. +He is a 73-year-old male was admitted to the LOMA LINDA UNIVERSITY MEDICAL CENTER-MURRIETA for cystectomy on 02/01/93. +Patient's VIN : 5KDTO67TIWP809983, SSN #382-50-5397, Driver's license no: Q734193X. +Phone (902) 409-7353, 1555 Long Pond Road, Pomeroy, Maryland 29924. + E-MAIL: Halit@google.com. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-09-27-explain_clinical_doc_sdoh_small_en.md b/docs/_posts/akrztrk/2024-09-27-explain_clinical_doc_sdoh_small_en.md new file mode 100644 index 0000000000..b3e2a4bad1 --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-27-explain_clinical_doc_sdoh_small_en.md @@ -0,0 +1,199 @@ +--- +layout: model +title: Explain Clinical Document - Social Determinants of Health (SDOH)-Small +author: John Snow Labs +name: explain_clinical_doc_sdoh_small +date: 2024-09-27 +tags: [en, licensed, clinical, pipeline, social_determinants, sdoh, ner, assertion, relation_extraction] +task: Pipeline Healthcare +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline is designed to + +- extract all social determinants of health (SDOH) entities from text, + +- assign assertion status to the extracted entities, + +- establish relations between the extracted entities. + +In this pipeline, [ner_sdoh](https://nlp.johnsnowlabs.com/2023/06/13/ner_sdoh_en.html) +NER model, [assertion_sdoh_wip](https://nlp.johnsnowlabs.com/2023/08/13/assertion_sdoh_wip_en.html) assertion model and [generic_re](https://nlp.johnsnowlabs.com/2022/12/20/generic_re.html) +relation extraction model were used to achieve those tasks. + +Clinical Entity Labels: + +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + + +Assertion Status Labels: + +`Present`, `Absent`, `Possible`, `Past`, `Hypotetical`, `Someone_Else` + +Relation Extraction Labels: + +`Access_To_Care-Financial_Status`, `Access_To_Care–Income`, `Access_To_Care-Social_Support`, `Access_To_Care-Substance_Use`, `Alcohol-Mental_Health`, `Alcohol-Quality_Of_Life`, `Alcohol–Smoking`, `Alcohol-Substance_Use`, `Alcohol-Violence_Or_Abuse`, `Childhood_Event-Violence_Or_Abuse`, `Community_Safety-Quality_Of_Life`, `Community_Safety-Violence_Or_Abuse`, `Diet-Eating_Disorder`, `Diet–Exercise`, `Diet–Gender`, `Diet–Obesity`, `Disability-Insurance_Status`, `Disability-Mental_Health`, `Disability-Quality_Of_Life`, `Disability-Social_Exclusion`, `Eating_Disorder-Food_Insecurity`, `Eating_Disorder-Mental_Health`, `Eating_Disorder–Obesity`, `Education–Employment`, `Education-Financial_Status`, `Education–Income`, `Education-Legal_Issues`, `Education-Quality_Of_Life`, `Education-Substance_Use`, `Employment-Financial_Status`, `Employment–Income`, `Employment-Insurance_Status`, `Employment-Quality_Of_Life`, `Environmental_Condition-Quality_Of_Life`, `Exercise-Mental_Health`, `Exercise–Obesity`, `Exercise-Quality_Of_Life`, `Exercise–Smoking`, `Exercise-Substance_Use`, `Financial_Status-Food_Insecurity`, `Financial_Status-Housing`, `Financial_Status-Income`, `Financial_Status-Insurance_Status`, `Financial_Status-Mental_Health`, `Financial_Status-Quality_Of_Life`, `Financial_Status-Social_Support`, `Food_Insecurity-Income`, `Food_Insecurity-Mental_Health`, `Food_Insecurity-Quality_Of_Life`, `Housing-Income`, `Housing-Insurance_Status`, `Housing-Quality_Of_Life`, `Income-Insurance_Status`, `Income-Quality_Of_Life`, `Language-Population_Group`, `Language-Race_Ethnicity`, `Language-Social_Exclusion`, `Legal_Issues-Race_Ethnicity`, `Legal_Issues-Substance_Use`, `Legal_Issues-Violence_Or_Abuse`, `Marital_Status-Mental_Health`, `Marital_Status-Violence_Or_Abuse`, `Mental_Health-Obesity`, `Mental_Health-Quality_Of_Life`, `Mental_Health-Smoking`, `Mental_Health-Social_Exclusion`, `Mental_Health-Social_Support`, `Mental_Health-Substance_Use`, `Mental_Health-Violence_Or_Abuse`, `Obesity-Quality_Of_Life`, `Population_Group-Violence_Or_Abuse`, `Quality_Of_Life-Substance_Use`, `Race_Ethnicity-Social_Exclusion`, `Race_Ethnicity-Social_Support`, `Race_Ethnicity-Violence_Or_Abuse`, `Sexual_Activity-Sexual_Orientation`, `Sexual_Orientation-Social_Exclusion`, `Sexual_Orientation-Substance_Use`, `Sexual_Orientation-Violence_Or_Abuse`, `Smoking-Substance_Use`, `Social_Exclusion-Substance_Use`, `Substance_Duration-Substance_Use`, `Substance_Frequency-Substance_Use`, `Substance_Quantity-Substance_Use`, `Substance_Use-Violence_Or_Abuse`, `Substance_Use-Communicable_Disease`, `Alcohol-Obesity` + + ## Predicted Entities +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.4_1727457167099.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.4_1727457167099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +sdoh_pipeline = PretrainedPipeline('explain_clinical_doc_sdoh_small', 'en', 'clinical/models') + +result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val sdoh_pipeline = new PretrainedPipeline("explain_clinical_doc_sdoh_small", "en", "clinical/models") + +val result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +
+ +## Results + +```bash +# NER_Result + +| | chunks | begin | end | sentence_id | entities | confidence | +|---:|:--------------------------------|--------:|------:|--------------:|:------------------|-------------:| +| 0 | anxiety | 47 | 53 | 0 | Mental_Health | 0.9897 | +| 1 | depression | 59 | 68 | 0 | Mental_Health | 0.9938 | +| 2 | his | 97 | 99 | 0 | Gender | 0.992 | +| 3 | quality of life | 101 | 115 | 0 | Quality_Of_Life | 0.6252 | +| 4 | He | 118 | 119 | 1 | Gender | 0.9996 | +| 5 | childhood trauma | 143 | 158 | 1 | Chidhood_Event | 0.7466 | +| 6 | violence | 171 | 178 | 1 | Violence_Or_Abuse | 0.5394 | +| 7 | abuse | 184 | 188 | 1 | Violence_Or_Abuse | 0.6209 | +| 8 | his | 193 | 195 | 1 | Gender | 0.9536 | +| 9 | his | 233 | 235 | 1 | Gender | 0.9772 | +| 10 | smoking | 237 | 243 | 1 | Smoking | 0.9858 | +| 11 | alcohol use | 246 | 256 | 1 | Alcohol | 0.68065 | +| 12 | mental health struggles | 270 | 292 | 1 | Mental_Health | 0.248033 | +| 13 | He | 295 | 296 | 2 | Gender | 0.9995 | +| 14 | substance use | 316 | 328 | 2 | Substance_Use | 0.6921 | +| 15 | sexual activity | 333 | 347 | 2 | Sexual_Activity | 0.62915 | +| 16 | monogamous | 368 | 377 | 2 | Sexual_Activity | 0.6915 | +| 17 | his | 382 | 384 | 2 | Gender | 0.9883 | +| 18 | his | 404 | 406 | 2 | Gender | 0.978 | +| 19 | wife | 408 | 411 | 2 | Family_Member | 0.9833 | +| 20 | immigrant | 432 | 440 | 3 | Population_Group | 0.9974 | +| 21 | English | 453 | 459 | 3 | Language | 0.9979 | +| 22 | He | 483 | 484 | 4 | Gender | 0.9996 | +| 23 | difficulty accessing healthcare | 495 | 525 | 4 | Access_To_Care | 0.3998 | +| 24 | medical insurance | 542 | 558 | 4 | Insurance_Status | 0.6721 | +| 25 | He | 561 | 562 | 5 | Gender | 0.9996 | +| 26 | herniated disc | 570 | 583 | 5 | Other_Disease | 0.71515 | +| 27 | hypertension | 586 | 597 | 5 | Hypertension | 0.9984 | +| 28 | coronary artery disease | 600 | 622 | 5 | Other_Disease | 0.847933 | +| 29 | CAD | 625 | 627 | 5 | Other_Disease | 0.9884 | +| 30 | diabetes mellitus | 634 | 650 | 5 | Other_Disease | 0.81115 | +| 31 | manic disorder | 671 | 684 | 6 | Mental_Health | 0.7929 | +| 32 | psychotic | 700 | 708 | 6 | Mental_Health | 0.9743 | +| 33 | impulsive behavior | 720 | 737 | 6 | Mental_Health | 0.41135 | +| 34 | He | 740 | 741 | 7 | Gender | 0.9996 | +| 35 | disabled | 752 | 759 | 7 | Disability | 0.9999 | + +# Assertıon_Result: + +| | chunks | entities | assertion | +|---:|:--------------------------------|:------------------|:------------| +| 0 | anxiety | Mental_Health | Present | +| 1 | depression | Mental_Health | Present | +| 2 | quality of life | Quality_Of_Life | Present | +| 3 | violence | Violence_Or_Abuse | Past | +| 4 | abuse | Violence_Or_Abuse | Past | +| 5 | smoking | Smoking | Present | +| 6 | alcohol use | Alcohol | Present | +| 7 | mental health struggles | Mental_Health | Present | +| 8 | substance use | Substance_Use | Absent | +| 9 | sexual activity | Sexual_Activity | Present | +| 10 | monogamous | Sexual_Activity | Absent | +| 11 | difficulty accessing healthcare | Access_To_Care | Absent | +| 12 | medical insurance | Insurance_Status | Present | +| 13 | hypertension | Hypertension | Present | +| 14 | manic disorder | Mental_Health | Present | +| 15 | psychotic | Mental_Health | Present | +| 16 | impulsive behavior | Mental_Health | Present | + + +# RE Result + +| | sentence | entity1_begin | entity1_end | chunk1 | entity1 | entity2_begin | entity2_end | chunk2 | entity2 | relation | confidence | +|---:|-----------:|----------------:|--------------:|:------------|:------------------|----------------:|--------------:|:------------------------|:----------------|:--------------------------------|-------------:| +| 0 | 0 | 47 | 53 | anxiety | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 1 | 0 | 59 | 68 | depression | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 2 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 3 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 4 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 5 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 6 | 1 | 237 | 243 | smoking | Smoking | 270 | 292 | mental health struggles | Mental_Health | Smoking-Mental_Health | 1 | +| 7 | 1 | 246 | 256 | alcohol use | Alcohol | 270 | 292 | mental health struggles | Mental_Health | Alcohol-Mental_Health | 1 | +| 8 | 3 | 432 | 440 | immigrant | Population_Group | 453 | 459 | English | Language | Population_Group-Language | 1 | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|explain_clinical_doc_sdoh_small| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- AssertionDLModel +- PerceptronModel +- DependencyParserModel +- GenericREModel diff --git a/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_docwise_wip_en.md b/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_docwise_wip_en.md new file mode 100644 index 0000000000..12e095aa0a --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_docwise_wip_en.md @@ -0,0 +1,147 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_docwise_wip +date: 2024-09-30 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.4.1_3.4_1727724575084.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.4.1_3.4_1727724575084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) + + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) + + +``` +
+ +## Results + +```bash + +Masked with entity labels +------------------------------ +Name : , Record date: , MR: . +ID: , Dr. , IP . +He is a -year-old male was admitted to the for cystectomy on . +Patient's VIN : , SSN , Driver's license no: . +Phone , , , . +E-MAIL: . + +Obfuscated +------------------------------ +Name : Axel Bohr, Record date: 2093-02-01, MR: 61443154. +ID: #00867619, Dr. Rickard Charles, IP 002.002.002.002. +He is a 73-year-old male was admitted to the LOMA LINDA UNIVERSITY MEDICAL CENTER-MURRIETA for cystectomy on 02/01/93. +Patient's VIN : 5KDTO67TIWP809983, SSN #382-50-5397, Driver's license no: Q734193X. +Phone (902) 409-7353, 1555 Long Pond Road, Pomeroy, Maryland 29924. + E-MAIL: Halit@google.com. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_docwise_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- InternalDocumentSplitter +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- MedicalNerModel +- MedicalNerModel +- NerConverterInternalModel +- NerConverterInternalModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- TextMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- LightDeIdentification +- LightDeIdentification diff --git a/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_v2_wip_en.md b/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..ca4fa6a83c --- /dev/null +++ b/docs/_posts/akrztrk/2024-09-30-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,145 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-09-30 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727723330867.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.4.1_3.4_1727723330867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) + + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 87719435. +ID: #12315112, Dr. John Green, IP 203.120.223.13. +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. +Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94108. E-MAIL: smith@gmail.com.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) + + +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Name : , Record date: , MR: . +ID: , Dr. , IP . +He is a -year-old male was admitted to the for cystectomy on . +Patient's VIN : , SSN , Driver's license no: . +Phone , , , . +E-MAIL: . + +Obfuscated +------------------------------ +Name : Axel Bohr, Record date: 2093-02-01, MR: 61443154. +ID: #00867619, Dr. Rickard Charles, IP 002.002.002.002. +He is a 73-year-old male was admitted to the LOMA LINDA UNIVERSITY MEDICAL CENTER-MURRIETA for cystectomy on 02/01/93. +Patient's VIN : 5KDTO67TIWP809983, SSN #382-50-5397, Driver's license no: Q734193X. +Phone (902) 409-7353, 1555 Long Pond Road, Pomeroy, Maryland 29924. + E-MAIL: Halit@google.com. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-10-01-jsl_medm_q8_v1_en.md b/docs/_posts/akrztrk/2024-10-01-jsl_medm_q8_v1_en.md new file mode 100644 index 0000000000..3c7fa991b9 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-01-jsl_medm_q8_v1_en.md @@ -0,0 +1,217 @@ +--- +layout: model +title: JSL_MedM (LLM - q8) +author: John Snow Labs +name: jsl_medm_q8_v1 +date: 2024-10-01 +tags: [licensed, clinical, en, llm, rag, qa, chat, tensorflow] +task: [Summarization, Question Answering] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v1_en_5.5.0_3.0_1727809959050.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v1_en_5.5.0_3.0_1727809959050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +medm_prompt = """ +summarize the following content. + + content: + ---------------------------- INDICATIONS AND USAGE --------------------------- + KISUNLA is an amyloid beta-directed antibody indicated for the + treatment of Alzheimer’s disease. Treatment with KISUNLA should be + initiated in patients with mild cognitive impairment or mild dementia + stage of disease, the population in which treatment was initiated in the + clinical trials. (1) + ------------------------DOSAGE AND ADMINISTRATION----------------------- + • Confirm the presence of amyloid beta pathology prior to initiating + treatment. (2.1) + • The recommended dosage of KISUNLA is 700 mg administered as + an intravenous infusion over approximately 30 minutes every four + weeks for the first three doses, followed by 1400 mg every four + weeks. (2.2) + • Consider stopping dosing with KISUNLA based on reduction of + amyloid plaques to minimal levels on amyloid PET imaging. (2.2) + • Obtain a recent baseline brain MRI prior to initiating treatment. + (2.3, 5.1) + • Obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. If + radiographically observed ARIA occurs, treatment + recommendations are based on type, severity, and presence of + symptoms. (2.3, 5.1) + • Dilution to a final concentration of 4 mg/mL to 10 mg/mL with 0.9% + Sodium Chloride Injection, is required prior to administration. (2.4) + ----------------------DOSAGE FORMS AND STRENGTHS--------------------- + Injection: 350 mg/20 mL (17.5 mg/mL) in a single-dose vial. (3) + ------------------------------- CONTRAINDICATIONS ------------------------------ + KISUNLA is contraindicated in patients with known serious + hypersensitivity to donanemab-azbt or to any of the excipients. (4, 5.2) + ------------------------WARNINGS AND PRECAUTIONS----------------------- + • Amyloid Related Imaging Abnormalities (ARIA): Enhanced clinical + vigilance for ARIA is recommended during the first 24 weeks of + treatment with KISUNLA. Risk of ARIA, including symptomatic + ARIA, was increased in apolipoprotein E ε4 (ApoE ε4) + homozygotes compared to heterozygotes and noncarriers. The risk + of ARIA-E and ARIA-H is increased in KISUNLA-treated patients + with pretreatment microhemorrhages and/or superficial siderosis. If + a patient experiences symptoms suggestive of ARIA, clinical + evaluation should be performed, including MRI scanning if + indicated. (2.3, 5.1) + • Infusion-Related Reactions: The infusion rate may be reduced, or + the infusion may be discontinued, and appropriate therapy initiated + as clinically indicated. Consider pre-treatment with antihistamines, + acetaminophen, or corticosteroids prior to subsequent dosing. (5.3) + -------------------------------ADVERSE REACTIONS------------------------------ + Most common adverse reactions (at least 10% and higher incidence + compared to placebo): ARIA-E, ARIA-H microhemorrhage, ARIA-H + superficial siderosis, and headache. (6.1) +""" + +data = spark.createDataFrame([[medm_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val medm_prompt = """ +summarize the following content. + + content: + ---------------------------- INDICATIONS AND USAGE --------------------------- + KISUNLA is an amyloid beta-directed antibody indicated for the + treatment of Alzheimer’s disease. Treatment with KISUNLA should be + initiated in patients with mild cognitive impairment or mild dementia + stage of disease, the population in which treatment was initiated in the + clinical trials. (1) + ------------------------DOSAGE AND ADMINISTRATION----------------------- + • Confirm the presence of amyloid beta pathology prior to initiating + treatment. (2.1) + • The recommended dosage of KISUNLA is 700 mg administered as + an intravenous infusion over approximately 30 minutes every four + weeks for the first three doses, followed by 1400 mg every four + weeks. (2.2) + • Consider stopping dosing with KISUNLA based on reduction of + amyloid plaques to minimal levels on amyloid PET imaging. (2.2) + • Obtain a recent baseline brain MRI prior to initiating treatment. + (2.3, 5.1) + • Obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. If + radiographically observed ARIA occurs, treatment + recommendations are based on type, severity, and presence of + symptoms. (2.3, 5.1) + • Dilution to a final concentration of 4 mg/mL to 10 mg/mL with 0.9% + Sodium Chloride Injection, is required prior to administration. (2.4) + ----------------------DOSAGE FORMS AND STRENGTHS--------------------- + Injection: 350 mg/20 mL (17.5 mg/mL) in a single-dose vial. (3) + ------------------------------- CONTRAINDICATIONS ------------------------------ + KISUNLA is contraindicated in patients with known serious + hypersensitivity to donanemab-azbt or to any of the excipients. (4, 5.2) + ------------------------WARNINGS AND PRECAUTIONS----------------------- + • Amyloid Related Imaging Abnormalities (ARIA): Enhanced clinical + vigilance for ARIA is recommended during the first 24 weeks of + treatment with KISUNLA. Risk of ARIA, including symptomatic + ARIA, was increased in apolipoprotein E ε4 (ApoE ε4) + homozygotes compared to heterozygotes and noncarriers. The risk + of ARIA-E and ARIA-H is increased in KISUNLA-treated patients + with pretreatment microhemorrhages and/or superficial siderosis. If + a patient experiences symptoms suggestive of ARIA, clinical + evaluation should be performed, including MRI scanning if + indicated. (2.3, 5.1) + • Infusion-Related Reactions: The infusion rate may be reduced, or + the infusion may be discontinued, and appropriate therapy initiated + as clinically indicated. Consider pre-treatment with antihistamines, + acetaminophen, or corticosteroids prior to subsequent dosing. (5.3) + -------------------------------ADVERSE REACTIONS------------------------------ + Most common adverse reactions (at least 10% and higher incidence + compared to placebo): ARIA-E, ARIA-H microhemorrhage, ARIA-H + superficial siderosis, and headache. (6.1) +""" + +val data = Seq(medm_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +KISUNLA is an amyloid beta-directed antibody indicated for the treatment of Alzheimer's disease. It is recommended to initiate treatment in patients with mild cognitive impairment or mild dementia stage of disease. The recommended dosage is 700 mg administered as an intravenous infusion over approximately 30 minutes every four weeks for the first three doses, followed by 1400 mg every four weeks. Patients should have a recent baseline brain MRI prior to initiating treatment and obtain an MRI prior to the 2nd, + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q8_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|8.2 GB| \ No newline at end of file diff --git a/docs/_posts/akrztrk/2024-10-01-jsl_meds_ner_q4_v2_en.md b/docs/_posts/akrztrk/2024-10-01-jsl_meds_ner_q4_v2_en.md new file mode 100644 index 0000000000..953f1ac6df --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-01-jsl_meds_ner_q4_v2_en.md @@ -0,0 +1,160 @@ +--- +layout: model +title: JJSL_MedS_NER_v2 (LLM - q4) +author: John Snow Labs +name: jsl_meds_ner_q4_v2 +date: 2024-10-01 +tags: [licensed, clinical, en, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since name is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q4_v2_en_5.5.0_3.0_1727813187306.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q4_v2_en_5.5.0_3.0_1727813187306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q4_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q4_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_q4_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| \ No newline at end of file diff --git a/docs/_posts/akrztrk/2024-10-02-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/akrztrk/2024-10-02-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..d42cbdd7f1 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-02-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,123 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (English) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-02 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IPADDR`, `LICENSE`, `LOCATION`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SREET`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.1_3.4_1727897157489.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.1_3.4_1727897157489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient.""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']])) +print(''.join([i.result for i in deid_result[0]['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient.""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result(0)("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , a at in , was contacted on regarding a -year-old male patient. + +Obfuscated +------------------------------ +Dr. Rolande Cleverly, a Fish farm manager at NORTH COUNTRY HOSPITAL & HEALTH CENTER in BARMOLLOCH, was contacted on 16/10/2023 regarding a 48-year-old male patient. +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_docwise_wip_en.md b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_docwise_wip_en.md new file mode 100644 index 0000000000..82ac186345 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_docwise_wip_en.md @@ -0,0 +1,137 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Document Wise) +author: John Snow Labs +name: clinical_deidentification_docwise_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, docwise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, +`LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, +`SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, +`CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.0_1727968790363.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_wip_en_5.5.0_3.0_1727968790363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.result for i in deid_result['mask_entity']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("mask_entity").map(_("result").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is +patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Edwardo Graft, from MCBRIDE ORTHOPEDIC HOSPITAL in CLAMART, attended to the patient on 14/06/2024. +The patient’s medical record number is 78295621. +The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_docwise_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- InternalDocumentSplitter +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- MedicalNerModel +- MedicalNerModel +- NerConverterInternalModel +- NerConverterInternalModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- TextMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- LightDeIdentification +- LightDeIdentification diff --git a/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..4d6b0b447f --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,134 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.0_1727970433058.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.0_1727970433058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_v2_wip_en.md b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_v2_wip_en.md new file mode 100644 index 0000000000..3213bda933 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-03-clinical_deidentification_v2_wip_en.md @@ -0,0 +1,136 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_v2_wip +date: 2024-10-03 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, +`ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `NAME`, +`ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities. + +## Predicted Entities + +`LOCATION`, `CONTACT`, `PROFESSION`, `NAME`, `DATE`, `ID`, `AGE`, `MEDICALRECORD`, `ORGANIZATION`, `HEALTHPLAN`, `DOCTOR`, `USERNAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, +`CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.0_1727971916774.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_v2_wip_en_5.5.0_3.0_1727971916774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_v2_wip", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Alissa Irving, from KINDRED HOSPITAL SEATTLE in Geleen, attended to the patient on 22/06/2024. +The patient’s medical record number is 16109604. +The patient, Burnette Carte, is 49 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_v2_wip| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/akrztrk/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..ac13d3b280 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-04 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.0_1728048118878.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.5.0_3.0_1728048118878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_medm_q4_v1_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_medm_q4_v1_en.md new file mode 100644 index 0000000000..7a253bca17 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_medm_q4_v1_en.md @@ -0,0 +1,218 @@ +--- +layout: model +title: JSL_MedM (LLM - q4) +author: John Snow Labs +name: jsl_medm_q4_v1 +date: 2024-10-04 +tags: [licensed, clinical, en, llm, rag, qa, chat, tensorflow] +task: [Summarization, Question Answering] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v1_en_5.5.0_3.0_1728059101782.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v1_en_5.5.0_3.0_1728059101782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +medm_prompt = """ +summarize the following content. + + content: + ---------------------------- INDICATIONS AND USAGE --------------------------- + KISUNLA is an amyloid beta-directed antibody indicated for the + treatment of Alzheimer’s disease. Treatment with KISUNLA should be + initiated in patients with mild cognitive impairment or mild dementia + stage of disease, the population in which treatment was initiated in the + clinical trials. (1) + ------------------------DOSAGE AND ADMINISTRATION----------------------- + • Confirm the presence of amyloid beta pathology prior to initiating + treatment. (2.1) + • The recommended dosage of KISUNLA is 700 mg administered as + an intravenous infusion over approximately 30 minutes every four + weeks for the first three doses, followed by 1400 mg every four + weeks. (2.2) + • Consider stopping dosing with KISUNLA based on reduction of + amyloid plaques to minimal levels on amyloid PET imaging. (2.2) + • Obtain a recent baseline brain MRI prior to initiating treatment. + (2.3, 5.1) + • Obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. If + radiographically observed ARIA occurs, treatment + recommendations are based on type, severity, and presence of + symptoms. (2.3, 5.1) + • Dilution to a final concentration of 4 mg/mL to 10 mg/mL with 0.9% + Sodium Chloride Injection, is required prior to administration. (2.4) + ----------------------DOSAGE FORMS AND STRENGTHS--------------------- + Injection: 350 mg/20 mL (17.5 mg/mL) in a single-dose vial. (3) + ------------------------------- CONTRAINDICATIONS ------------------------------ + KISUNLA is contraindicated in patients with known serious + hypersensitivity to donanemab-azbt or to any of the excipients. (4, 5.2) + ------------------------WARNINGS AND PRECAUTIONS----------------------- + • Amyloid Related Imaging Abnormalities (ARIA): Enhanced clinical + vigilance for ARIA is recommended during the first 24 weeks of + treatment with KISUNLA. Risk of ARIA, including symptomatic + ARIA, was increased in apolipoprotein E ε4 (ApoE ε4) + homozygotes compared to heterozygotes and noncarriers. The risk + of ARIA-E and ARIA-H is increased in KISUNLA-treated patients + with pretreatment microhemorrhages and/or superficial siderosis. If + a patient experiences symptoms suggestive of ARIA, clinical + evaluation should be performed, including MRI scanning if + indicated. (2.3, 5.1) + • Infusion-Related Reactions: The infusion rate may be reduced, or + the infusion may be discontinued, and appropriate therapy initiated + as clinically indicated. Consider pre-treatment with antihistamines, + acetaminophen, or corticosteroids prior to subsequent dosing. (5.3) + -------------------------------ADVERSE REACTIONS------------------------------ + Most common adverse reactions (at least 10% and higher incidence + compared to placebo): ARIA-E, ARIA-H microhemorrhage, ARIA-H + superficial siderosis, and headache. (6.1) +""" + +data = spark.createDataFrame([[medm_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val medm_prompt = """ +summarize the following content. + + content: + ---------------------------- INDICATIONS AND USAGE --------------------------- + KISUNLA is an amyloid beta-directed antibody indicated for the + treatment of Alzheimer’s disease. Treatment with KISUNLA should be + initiated in patients with mild cognitive impairment or mild dementia + stage of disease, the population in which treatment was initiated in the + clinical trials. (1) + ------------------------DOSAGE AND ADMINISTRATION----------------------- + • Confirm the presence of amyloid beta pathology prior to initiating + treatment. (2.1) + • The recommended dosage of KISUNLA is 700 mg administered as + an intravenous infusion over approximately 30 minutes every four + weeks for the first three doses, followed by 1400 mg every four + weeks. (2.2) + • Consider stopping dosing with KISUNLA based on reduction of + amyloid plaques to minimal levels on amyloid PET imaging. (2.2) + • Obtain a recent baseline brain MRI prior to initiating treatment. + (2.3, 5.1) + • Obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. If + radiographically observed ARIA occurs, treatment + recommendations are based on type, severity, and presence of + symptoms. (2.3, 5.1) + • Dilution to a final concentration of 4 mg/mL to 10 mg/mL with 0.9% + Sodium Chloride Injection, is required prior to administration. (2.4) + ----------------------DOSAGE FORMS AND STRENGTHS--------------------- + Injection: 350 mg/20 mL (17.5 mg/mL) in a single-dose vial. (3) + ------------------------------- CONTRAINDICATIONS ------------------------------ + KISUNLA is contraindicated in patients with known serious + hypersensitivity to donanemab-azbt or to any of the excipients. (4, 5.2) + ------------------------WARNINGS AND PRECAUTIONS----------------------- + • Amyloid Related Imaging Abnormalities (ARIA): Enhanced clinical + vigilance for ARIA is recommended during the first 24 weeks of + treatment with KISUNLA. Risk of ARIA, including symptomatic + ARIA, was increased in apolipoprotein E ε4 (ApoE ε4) + homozygotes compared to heterozygotes and noncarriers. The risk + of ARIA-E and ARIA-H is increased in KISUNLA-treated patients + with pretreatment microhemorrhages and/or superficial siderosis. If + a patient experiences symptoms suggestive of ARIA, clinical + evaluation should be performed, including MRI scanning if + indicated. (2.3, 5.1) + • Infusion-Related Reactions: The infusion rate may be reduced, or + the infusion may be discontinued, and appropriate therapy initiated + as clinically indicated. Consider pre-treatment with antihistamines, + acetaminophen, or corticosteroids prior to subsequent dosing. (5.3) + -------------------------------ADVERSE REACTIONS------------------------------ + Most common adverse reactions (at least 10% and higher incidence + compared to placebo): ARIA-E, ARIA-H microhemorrhage, ARIA-H + superficial siderosis, and headache. (6.1) +""" + +val data = Seq(medm_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +KISUNLA is an amyloid beta-directed antibody indicated for the treatment of Alzheimer's disease. It is recommended to initiate treatment in patients with mild cognitive impairment or mild dementia stage of disease. The recommended dosage is 700 mg administered as an intravenous infusion over approximately 30 minutes every four weeks for the first three doses, followed by 1400 mg every four weeks. Patients should have a recent baseline brain MRI prior to initiating treatment and obtain an MRI prior to the 2nd, + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q4_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|4.8 GB| diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q16_v2_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q16_v2_en.md new file mode 100644 index 0000000000..78a6c36914 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q16_v2_en.md @@ -0,0 +1,161 @@ +--- +layout: model +title: JJSL_MedS_NER_v2 (LLM - q16) +author: John Snow Labs +name: jsl_meds_ner_q16_v2 +date: 2024-10-04 +tags: [licensed, clinical, en, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since name is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q16_v2_en_5.5.0_3.0_1728061998680.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q16_v2_en_5.5.0_3.0_1728061998680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q16_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q16_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_q16_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q8_v2_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q8_v2_en.md new file mode 100644 index 0000000000..b78448c4dc --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_q8_v2_en.md @@ -0,0 +1,161 @@ +--- +layout: model +title: JJSL_MedS_NER_v2 (LLM - q8) +author: John Snow Labs +name: jsl_meds_ner_q8_v2 +date: 2024-10-04 +tags: [licensed, clinical, en, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since name is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q8_v2_en_5.5.0_3.0_1728061434766.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_q8_v2_en_5.5.0_3.0_1728061434766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q8_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_q8_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_q8_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q16_v1_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q16_v1_en.md new file mode 100644 index 0000000000..a273c2547d --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q16_v1_en.md @@ -0,0 +1,161 @@ +--- +layout: model +title: JJSL_MedS_NER (LLM - q16) +author: John Snow Labs +name: jsl_meds_ner_zs_q16_v1 +date: 2024-10-04 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q16_v1_en_5.5.0_3.0_1728079794196.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q16_v1_en_5.5.0_3.0_1728079794196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q16_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q16_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_zs_q16_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q4_v1_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q4_v1_en.md new file mode 100644 index 0000000000..4265a59c86 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q4_v1_en.md @@ -0,0 +1,161 @@ +--- +layout: model +title: JJSL_MedS_NER (LLM - q4) +author: John Snow Labs +name: jsl_meds_ner_zs_q4_v1 +date: 2024-10-04 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q4_v1_en_5.5.0_3.0_1728076514346.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q4_v1_en_5.5.0_3.0_1728076514346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q4_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q4_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_zs_q4_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| diff --git a/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q8_v1_en.md b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q8_v1_en.md new file mode 100644 index 0000000000..d01758b6c3 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-04-jsl_meds_ner_zs_q8_v1_en.md @@ -0,0 +1,161 @@ +--- +layout: model +title: JJSL_MedS_NER (LLM - q8) +author: John Snow Labs +name: jsl_meds_ner_zs_q8_v1 +date: 2024-10-04 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q8_v1_en_5.5.0_3.0_1728077563420.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_ner_zs_q8_v1_en_5.5.0_3.0_1728077563420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q8_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +data = spark.createDataFrame([[med_ner_prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_ner_zs_q8_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val med_ner_prompt = """ +### Template: +{ + "drugs": [ + { + "name": "", + "reactions": [] + } + ] +} +### Text: +I feel a bit drowsy & have a little blurred vision , and some gastric problems . +I 've been on Arthrotec 50 for over 10 years on and off , only taking it when I needed it . +Due to my arthritis getting progressively worse , to the point where I am in tears with the agony. +Gp 's started me on 75 twice a day and I have to take it every day for the next month to see how I get on , here goes . +So far its been very good , pains almost gone , but I feel a bit weird , did n't have that when on 50. +""" + +val data = Seq(med_ner_prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +{ + "drugs": [ + { + "name": "Arthrotec", + "reactions": [ + "drowsy", + "blurred vision", + "gastric problems" + ] + } + ] +} + +... + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_ner_zs_q8_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v1_en.md new file mode 100644 index 0000000000..3ddf7b1637 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v1_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: JSL_MedS (LLM - q16) +author: John Snow Labs +name: jsl_meds_q16_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, summarization, qa, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Summarization and Q&A based on a given context. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v1_en_5.5.0_3.0_1728141441935.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v1_en_5.5.0_3.0_1728141441935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The age group most susceptible to breast cancer, as mentioned in the text, is women over the age of 50. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q16_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v2_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v2_en.md new file mode 100644 index 0000000000..b21e0c2117 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v2_en.md @@ -0,0 +1,132 @@ +--- +layout: model +title: JSL_MedS_v2 (LLM - q16) +author: John Snow Labs +name: jsl_meds_q16_v2 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v2_en_5.5.0_3.0_1728148344788.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v2_en_5.5.0_3.0_1728148344788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The best treatment for this patient is E: Nitrofurantoin. This medication is considered safe during pregnancy and is effective for treating urinary tract infections (UTIs). The other options listed are not recommended during pregnancy due to potential risks to the fetus. Ampicillin (A) and Ceftriaxone (B) are generally safe but may not be the first-line treatment for UTIs. Ciprofloxacin (C) and Doxycycline (D) are contraindicated in pregnancy due to potential adverse effects on fetal development. Nitrofurantoin (E) is a commonly used antibiotic for UTIs during pregnancy and has a good safety profile. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q16_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v3_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v3_en.md new file mode 100644 index 0000000000..8cdb74fc6e --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q16_v3_en.md @@ -0,0 +1,132 @@ +--- +layout: model +title: JSL_MedS_v3 (LLM - q16) +author: John Snow Labs +name: jsl_meds_q16_v3 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v3_en_5.5.0_3.0_1728151859969.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v3_en_5.5.0_3.0_1728151859969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v3", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q16_v3", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The best treatment for this patient is E: Nitrofurantoin. This medication is considered safe during pregnancy and is effective for treating urinary tract infections (UTIs). The other options listed are not recommended during pregnancy due to potential risks to the fetus. Ampicillin (A) and Ceftriaxone (B) are generally safe but may not be the first-line treatment for UTIs. Ciprofloxacin (C) and Doxycycline (D) are contraindicated in pregnancy due to potential adverse effects on fetal development. Nitrofurantoin (E) is a commonly used antibiotic for UTIs during pregnancy and has a good safety profile. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q16_v3| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v1_en.md new file mode 100644 index 0000000000..dd7896a3a8 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v1_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: JSL_MedS (LLM - q4) +author: John Snow Labs +name: jsl_meds_q4_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, summarization, qa, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Summarization and Q&A based on a given context. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v1_en_5.5.0_3.0_1728139416476.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v1_en_5.5.0_3.0_1728139416476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The age group most susceptible to breast cancer, as mentioned in the text, is women over the age of 50. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q4_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v2_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v2_en.md new file mode 100644 index 0000000000..a57249fb15 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v2_en.md @@ -0,0 +1,122 @@ +--- +layout: model +title: JSL_MedS_v2 (LLM - q4) +author: John Snow Labs +name: jsl_meds_q4_v2 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v2_en_5.5.0_3.0_1728145616615.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v2_en_5.5.0_3.0_1728145616615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +### Question: +who you are, describe yourself +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +### Question: +who you are, describe yourself +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Hello! I am JSL Medical LLM, an artificial intelligence language model specialized in medical knowledge. I am here to assist you with any medical inquiries, provide information on health conditions, and help you understand medical terminology. Please feel free to ask me any questions related to health and medicine. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q4_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v3_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v3_en.md new file mode 100644 index 0000000000..0735e0245a --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q4_v3_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: JSL_MedS_v3 (LLM - q4) +author: John Snow Labs +name: jsl_meds_q4_v3 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v3_en_5.5.0_3.0_1728149142787.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q4_v3_en_5.5.0_3.0_1728149142787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v3", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v3", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Based on the provided text, the age group most susceptible to breast cancer is women over 50 years old. This is explicitly mentioned as the most common occurrence age for breast cancer. While other factors like genetic mutations, family history, and hormonal factors also contribute to the risk, the text specifically highlights age as a significant risk factor. It is important to note that while age is a risk factor, breast cancer can still occur in younger women, and awareness and preventive measures should be considered across all age groups. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q4_v3| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v1_en.md new file mode 100644 index 0000000000..9b7d46a164 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v1_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: JSL_MedS (LLM - q8) +author: John Snow Labs +name: jsl_meds_q8_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, summarization, qa, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Summarization and Q&A based on a given context. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v1_en_5.5.0_3.0_1728139997316.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v1_en_5.5.0_3.0_1728139997316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The age group most susceptible to breast cancer, as mentioned in the text, is women over the age of 50. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q8_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v2_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v2_en.md new file mode 100644 index 0000000000..8d2f1ba49a --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v2_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: JSL_MedS_v2 (LLM - q8) +author: John Snow Labs +name: jsl_meds_q8_v2 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v2_en_5.5.0_3.0_1728146317332.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v2_en_5.5.0_3.0_1728146317332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +Based on the following text, what age group is most susceptible to breast cancer? + +## Text: +The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as: +- A personal or family history of breast cancer +- A genetic mutation, such as BRCA1 or BRCA2 +- Exposure to radiation +- Age (most commonly occurring in women over 50) +- Early onset of menstruation or late menopause +- Obesity +- Hormonal factors, such as taking hormone replacement therapy +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Based on the provided text, the age group most susceptible to breast cancer is women over the age of 50. This is explicitly mentioned in the text, indicating that breast cancer is most commonly occurring in this age group. It is important to note that while age is a significant risk factor, other factors such as genetic mutations, family history, and hormonal factors also contribute to the likelihood of developing breast cancer. Regular screenings and awareness of risk factors are crucial for early detection and effective management of breast cancer. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q8_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v3_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v3_en.md new file mode 100644 index 0000000000..66732af6a9 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_q8_v3_en.md @@ -0,0 +1,132 @@ +--- +layout: model +title: JSL_MedS_v3 (LLM - q8) +author: John Snow Labs +name: jsl_meds_q8_v3 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v3_en_5.5.0_3.0_1728150176033.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q8_v3_en_5.5.0_3.0_1728150176033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v3", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v3", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The best treatment for this patient is E: Nitrofurantoin. This medication is considered safe during pregnancy and is effective for treating urinary tract infections (UTIs). It is important to avoid antibiotics that are contraindicated during pregnancy, such as tetracyclines (D: Doxycycline) and fluoroquinolones (C: Ciprofloxacin). Ampicillin (A) and Ceftriaxone (B) are also generally safe during pregnancy, but Nitrofurantoin is often preferred due to its specific efficacy for UTIs. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_q8_v3| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q16_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q16_v1_en.md new file mode 100644 index 0000000000..83fdb78893 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q16_v1_en.md @@ -0,0 +1,143 @@ +--- +layout: model +title: JSL_MedS_Rag_v1 (LLM - q16) +author: John Snow Labs +name: jsl_meds_rag_q16_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q16_v1_en_5.5.0_3.0_1728138036245.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q16_v1_en_5.5.0_3.0_1728138036245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q16_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q16_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Diabetes and obesity are closely related conditions. Obesity is a significant risk factor for the development of type 2 diabetes. +Excess body fat, particularly in the abdominal area, can lead to insulin resistance, where the body's cells do not respond effectively to insulin. +This resistance can result in elevated blood glucose levels, leading to diabetes. +Additionally, obesity can also contribute to the development of type 1 diabetes by triggering an autoimmune response that destines the body's cells to be resistant to insulin + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_rag_q16_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|6.1 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q4_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q4_v1_en.md new file mode 100644 index 0000000000..f4da709a59 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q4_v1_en.md @@ -0,0 +1,143 @@ +--- +layout: model +title: JSL_MedS_Rag_v1 (LLM - q4) +author: John Snow Labs +name: jsl_meds_rag_q4_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q4_v1_en_5.5.0_3.0_1728134952095.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q4_v1_en_5.5.0_3.0_1728134952095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q4_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q4_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Diabetes and obesity are closely related conditions. Obesity is a significant risk factor for the development of type 2 diabetes. +Excess body fat, particularly in the abdominal area, can lead to insulin resistance, where the body's cells do not respond effectively to insulin. +This resistance can result in elevated blood glucose levels, leading to diabetes. +Additionally, obesity can also contribute to the development of type 1 diabetes by triggering an autoimmune response that destines the body's cells to be resistant to insulin + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_rag_q4_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.4 GB| diff --git a/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q8_v1_en.md b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q8_v1_en.md new file mode 100644 index 0000000000..67c177c075 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-05-jsl_meds_rag_q8_v1_en.md @@ -0,0 +1,143 @@ +--- +layout: model +title: JSL_MedS_Rag_v1 (LLM - q8) +author: John Snow Labs +name: jsl_meds_rag_q8_v1 +date: 2024-10-05 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to perform Q&A, Summarization, RAG, and Chat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q8_v1_en_5.5.0_3.0_1728135664806.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_rag_q8_v1_en_5.5.0_3.0_1728135664806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q8_v1", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_meds_rag_q8_v1", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +### Template: +Use the following pieces of context to answer the user's question. If you return an answer, end with 'It's my pleasure'. +If you don't know the answer, just say that you don't know, don't try to make up an answer . + + +### Context: +'Background: Diabetes is referred to a group of diseases characterized by high glucose levels in blood. It is caused by a deficiency in the production or function of insulin or both, which can occur because of different reasons, resulting in protein and lipid metabolic disorders. The aim of this study was to systematically review the prevalence and incidence of type 1 diabetes in the world.', +'A higher prevalence of diabetes mellitus was observed in Addis Ababa public health institutions. Factors such as age, alcohol drinking, HDL, triglycerides, and vagarious physical activity were associated with diabetes mellitus. Concerned bodies need to work over the ever-increasing diabetes mellitus in Addis Ababa.', + +### Questions: +relationship between diabetes and obesity? +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +Diabetes and obesity are closely related conditions. Obesity is a significant risk factor for the development of type 2 diabetes. +Excess body fat, particularly in the abdominal area, can lead to insulin resistance, where the body's cells do not respond effectively to insulin. +This resistance can result in elevated blood glucose levels, leading to diabetes. +Additionally, obesity can also contribute to the development of type 1 diabetes by triggering an autoimmune response that destines the body's cells to be resistant to insulin + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_meds_rag_q8_v1| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.9 GB| diff --git a/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v2_en.md b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v2_en.md new file mode 100644 index 0000000000..57aa81a1f8 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v2_en.md @@ -0,0 +1,134 @@ +--- +layout: model +title: JSL_MedM_v2 (LLM - q4) +author: John Snow Labs +name: jsl_medm_q4_v2 +date: 2024-10-06 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v2_en_5.5.0_3.0_1728222920085.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v2_en_5.5.0_3.0_1728222920085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The correct answer is E: Nitrofurantoin. + +The patient is presenting with symptoms of urinary tract infection (UTI), which is common during pregnancy. Nitrofurantoin is a first-line antibiotic for uncomplicated UTI during pregnancy. It is safe and effective in treating UTI during pregnancy and has been used for many years without any adverse effects on the fetus. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q4_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|4.8 GB| diff --git a/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v3_en.md b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v3_en.md new file mode 100644 index 0000000000..14696230fa --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q4_v3_en.md @@ -0,0 +1,139 @@ +--- +layout: model +title: JSL_MedM_v3 (LLM - q4) +author: John Snow Labs +name: jsl_medm_q4_v3 +date: 2024-10-06 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v3_en_5.5.0_3.0_1728230214812.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q4_v3_en_5.5.0_3.0_1728230214812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v3", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q4_v3", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The best treatment for a pregnant woman at 22 weeks gestation presenting with symptoms of a urinary tract infection (UTI) is: + +E: Nitrofurantoin + +Here's the rationale: + +- The patient's symptoms of burning upon urination, worsening over a day, and absence of costovertebral angle tenderness suggest a urinary tract infection (UTI). +- The patient is pregnant, which increases the risk of UTIs and their complications, such as pyelonephritis + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q4_v3| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|8.8 GB| diff --git a/docs/_posts/akrztrk/2024-10-06-jsl_medm_q8_v2_en.md b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q8_v2_en.md new file mode 100644 index 0000000000..ec8edcf0ea --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-06-jsl_medm_q8_v2_en.md @@ -0,0 +1,134 @@ +--- +layout: model +title: JSL_MedM_v2 (LLM - q8) +author: John Snow Labs +name: jsl_medm_q8_v2 +date: 2024-10-06 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v2_en_5.5.0_3.0_1728224117678.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v2_en_5.5.0_3.0_1728224117678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The correct answer is E: Nitrofurantoin. + +The patient is presenting with symptoms of urinary tract infection (UTI), which is common during pregnancy. Nitrofurantoin is a first-line antibiotic for uncomplicated UTI during pregnancy. It is safe and effective in treating UTI during pregnancy and has been used for many years without any adverse effects on the fetus. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q8_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|8.2 GB| diff --git a/docs/_posts/bugeki/2024-09-20-ner_deid_subentity_augmented_v2_en.md b/docs/_posts/bugeki/2024-09-20-ner_deid_subentity_augmented_v2_en.md new file mode 100644 index 0000000000..2cd9d7cb84 --- /dev/null +++ b/docs/_posts/bugeki/2024-09-20-ner_deid_subentity_augmented_v2_en.md @@ -0,0 +1,181 @@ +--- +layout: model +title: Detect PHI for Deidentification (Subentity- Augmented) +author: John Snow Labs +name: ner_deid_subentity_augmented_v2 +date: 2024-09-20 +tags: [licensed, en, ner, deid] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Named Entity recognition annotator allows for a generic model to be trained by utilizing a deep learning algorithm (Char CNNs - BiLSTM - CRF - word embeddings) +inspired on a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM, CNN. Deidentification NER is a Named Entity Recognition model +that annotates text to find protected health information that may need to be deidentified. Model detects 18 entities. + +## Predicted Entities + +`ZIP`, `ORGANIZATION`, `COUNTRY`, `PATIENT`, `PROFESSION`, `STATE`, `IDNUM`, `PHONE`, `STREET`, `HOSPITAL`, `LOCATION_OTHER`, `AGE`, `DOCTOR`, `CITY`, `MEDICALRECORD`, `DEVICE`, `DATE`, `USERNAME` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_deid_subentity_augmented_v2_en_5.4.1_3.0_1726822472566.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_deid_subentity_augmented_v2_en_5.4.1_3.0_1726822472566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetector()\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +deid_ner = MedicalNerModel.pretrained("ner_deid_subentity_augmented_v2", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk_subentity") + +nlpPipeline = Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + deid_ner, + ner_converter]) + +model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text")) + +text = "A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 Date : 01/13/93 PCP : Oliveira, 25 year old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 302 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine." + +data = spark.createDataFrame([[text]]).toDF("text") + +results = model.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = new SentenceDetector() + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val deid_ner = MedicalNerModel.pretrained("ner_deid_subentity_augmented_v2", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk_subentity") + +val nlpPipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + deid_ner, + ner_converter)) + +val data = Seq("A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 Date : 01/13/93 PCP : Oliveira, 25 year old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 302 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++---+-----------------------------+-----+---+-------------+----------+ +| |ner_chunk |begin|end|ner_label |confidence| ++---+-----------------------------+-----+---+-------------+----------+ +| 0 |2093-01-13 |17 |26 |DATE |1.0 | +| 1 |David Hale |29 |38 |DOCTOR |0.9998 | +| 2 |Hendrickson, Ora |54 |69 |PATIENT |0.8085334 | +| 3 |7194334 |77 |83 |MEDICALRECORD|0.9971 | +| 4 |01/13/93 |92 |99 |DATE |1.0 | +| 5 |Oliveira |107 |114|DOCTOR |1.0 | +| 6 |25 |117 |118|AGE |0.9995 | +| 7 |1-11-2000 |144 |152|DATE |0.9998 | +| 8 |Cocke County Baptist Hospital|155 |183|HOSPITAL |0.84585 | +| 9 |0295 Keats Street |186 |202|STREET |0.99956673| +| 10|302 786 5227 |215 |226|PHONE |0.9714 | +| 11|Brothers Coal-Mine |293 |310|ORGANIZATION |0.9285 | ++---+-----------------------------+-----+---+-------------+----------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_deid_subentity_augmented_v2| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|34.8 MB| + +## Benchmarking + +```bash + label tp fp fn total precision recall f1 + AGE 727.0 46.0 35.0 762.0 0.9405 0.9541 0.9472 + CITY 269.0 45.0 72.0 341.0 0.8567 0.7889 0.8214 + COUNTRY 96.0 35.0 32.0 128.0 0.7328 0.75 0.7413 + DATE 5531.0 69.0 111.0 5642.0 0.9877 0.9803 0.984 + DEVICE 10.0 0.0 0.0 10.0 1.0 1.0 1.0 + DOCTOR 3368.0 231.0 179.0 3547.0 0.9358 0.9495 0.9426 + HOSPITAL 1377.0 69.0 207.0 1584.0 0.9523 0.8693 0.9089 + IDNUM 161.0 36.0 49.0 210.0 0.8173 0.7667 0.7912 +LOCATION_OTHER 19.0 3.0 2.0 21.0 0.8636 0.9048 0.8837 + MEDICALRECORD 412.0 20.0 32.0 444.0 0.9537 0.9279 0.9406 + ORGANIZATION 101.0 36.0 37.0 138.0 0.7372 0.7319 0.7345 + PATIENT 1468.0 101.0 159.0 1627.0 0.9356 0.9023 0.9186 + PHONE 346.0 32.0 8.0 354.0 0.9153 0.9774 0.9454 + PROFESSION 271.0 72.0 65.0 336.0 0.7901 0.8065 0.7982 + STATE 178.0 28.0 27.0 205.0 0.8641 0.8683 0.8662 + STREET 408.0 22.0 7.0 415.0 0.9488 0.9831 0.9657 + USERNAME 87.0 4.0 14.0 101.0 0.956 0.8614 0.9063 + ZIP 129.0 3.0 10.0 139.0 0.9773 0.9281 0.952 + macro - - - - - - 0.8916 + micro - - - - - - 0.94 +``` diff --git a/docs/_posts/bugeki/2024-09-27-explain_clinical_doc_sdoh_small_en.md b/docs/_posts/bugeki/2024-09-27-explain_clinical_doc_sdoh_small_en.md new file mode 100644 index 0000000000..8dff0b3289 --- /dev/null +++ b/docs/_posts/bugeki/2024-09-27-explain_clinical_doc_sdoh_small_en.md @@ -0,0 +1,198 @@ +--- +layout: model +title: Explain Clinical Document - Social Determinants of Health (SDOH)-Small +author: John Snow Labs +name: explain_clinical_doc_sdoh_small +date: 2024-09-27 +tags: [licensed, en, clinical, pipeline, social_determinants, sdoh, ner, assertion, relation_extraction] +task: Pipeline Healthcare +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline is designed to + +- extract all social determinants of health (SDOH) entities from text, + +- assign assertion status to the extracted entities, + +- establish relations between the extracted entities. + +In this pipeline, [ner_sdoh](https://nlp.johnsnowlabs.com/2023/06/13/ner_sdoh_en.html) +NER model, [assertion_sdoh_wip](https://nlp.johnsnowlabs.com/2023/08/13/assertion_sdoh_wip_en.html) assertion model and [generic_re](https://nlp.johnsnowlabs.com/2022/12/20/generic_re.html) +relation extraction model were used to achieve those tasks. + +Clinical Entity Labels: + +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + + +Assertion Status Labels: + +`Present`, `Absent`, `Possible`, `Past`, `Hypotetical`, `Someone_Else` + +Relation Extraction Labels: + +`Access_To_Care-Financial_Status`, `Access_To_Care–Income`, `Access_To_Care-Social_Support`, `Access_To_Care-Substance_Use`, `Alcohol-Mental_Health`, `Alcohol-Quality_Of_Life`, `Alcohol–Smoking`, `Alcohol-Substance_Use`, `Alcohol-Violence_Or_Abuse`, `Childhood_Event-Violence_Or_Abuse`, `Community_Safety-Quality_Of_Life`, `Community_Safety-Violence_Or_Abuse`, `Diet-Eating_Disorder`, `Diet–Exercise`, `Diet–Gender`, `Diet–Obesity`, `Disability-Insurance_Status`, `Disability-Mental_Health`, `Disability-Quality_Of_Life`, `Disability-Social_Exclusion`, `Eating_Disorder-Food_Insecurity`, `Eating_Disorder-Mental_Health`, `Eating_Disorder–Obesity`, `Education–Employment`, `Education-Financial_Status`, `Education–Income`, `Education-Legal_Issues`, `Education-Quality_Of_Life`, `Education-Substance_Use`, `Employment-Financial_Status`, `Employment–Income`, `Employment-Insurance_Status`, `Employment-Quality_Of_Life`, `Environmental_Condition-Quality_Of_Life`, `Exercise-Mental_Health`, `Exercise–Obesity`, `Exercise-Quality_Of_Life`, `Exercise–Smoking`, `Exercise-Substance_Use`, `Financial_Status-Food_Insecurity`, `Financial_Status-Housing`, `Financial_Status-Income`, `Financial_Status-Insurance_Status`, `Financial_Status-Mental_Health`, `Financial_Status-Quality_Of_Life`, `Financial_Status-Social_Support`, `Food_Insecurity-Income`, `Food_Insecurity-Mental_Health`, `Food_Insecurity-Quality_Of_Life`, `Housing-Income`, `Housing-Insurance_Status`, `Housing-Quality_Of_Life`, `Income-Insurance_Status`, `Income-Quality_Of_Life`, `Language-Population_Group`, `Language-Race_Ethnicity`, `Language-Social_Exclusion`, `Legal_Issues-Race_Ethnicity`, `Legal_Issues-Substance_Use`, `Legal_Issues-Violence_Or_Abuse`, `Marital_Status-Mental_Health`, `Marital_Status-Violence_Or_Abuse`, `Mental_Health-Obesity`, `Mental_Health-Quality_Of_Life`, `Mental_Health-Smoking`, `Mental_Health-Social_Exclusion`, `Mental_Health-Social_Support`, `Mental_Health-Substance_Use`, `Mental_Health-Violence_Or_Abuse`, `Obesity-Quality_Of_Life`, `Population_Group-Violence_Or_Abuse`, `Quality_Of_Life-Substance_Use`, `Race_Ethnicity-Social_Exclusion`, `Race_Ethnicity-Social_Support`, `Race_Ethnicity-Violence_Or_Abuse`, `Sexual_Activity-Sexual_Orientation`, `Sexual_Orientation-Social_Exclusion`, `Sexual_Orientation-Substance_Use`, `Sexual_Orientation-Violence_Or_Abuse`, `Smoking-Substance_Use`, `Social_Exclusion-Substance_Use`, `Substance_Duration-Substance_Use`, `Substance_Frequency-Substance_Use`, `Substance_Quantity-Substance_Use`, `Substance_Use-Violence_Or_Abuse`, `Substance_Use-Communicable_Disease`, `Alcohol-Obesity` + + ## Predicted Entities +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.0_1727459538758.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/explain_clinical_doc_sdoh_small_en_5.4.1_3.0_1727459538758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +sdoh_pipeline = PretrainedPipeline('explain_clinical_doc_sdoh_small', 'en', 'clinical/models') + +result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val sdoh_pipeline = new PretrainedPipeline("explain_clinical_doc_sdoh_small", "en", "clinical/models") + +val result = sdoh_pipeline.fullAnnotate("""The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life. +He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles. +He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife. +The patient is an immigrant and speaks English as a second language. +He reported difficulty accessing healthcare due to lack of medical insurance. +He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus. +The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.""") +``` +
+ +## Results + +```bash +# NER_Result + +| | chunks | begin | end | sentence_id | entities | confidence | +|---:|:--------------------------------|--------:|------:|--------------:|:------------------|-------------:| +| 0 | anxiety | 47 | 53 | 0 | Mental_Health | 0.9897 | +| 1 | depression | 59 | 68 | 0 | Mental_Health | 0.9938 | +| 2 | his | 97 | 99 | 0 | Gender | 0.992 | +| 3 | quality of life | 101 | 115 | 0 | Quality_Of_Life | 0.6252 | +| 4 | He | 118 | 119 | 1 | Gender | 0.9996 | +| 5 | childhood trauma | 143 | 158 | 1 | Chidhood_Event | 0.7466 | +| 6 | violence | 171 | 178 | 1 | Violence_Or_Abuse | 0.5394 | +| 7 | abuse | 184 | 188 | 1 | Violence_Or_Abuse | 0.6209 | +| 8 | his | 193 | 195 | 1 | Gender | 0.9536 | +| 9 | his | 233 | 235 | 1 | Gender | 0.9772 | +| 10 | smoking | 237 | 243 | 1 | Smoking | 0.9858 | +| 11 | alcohol use | 246 | 256 | 1 | Alcohol | 0.68065 | +| 12 | mental health struggles | 270 | 292 | 1 | Mental_Health | 0.248033 | +| 13 | He | 295 | 296 | 2 | Gender | 0.9995 | +| 14 | substance use | 316 | 328 | 2 | Substance_Use | 0.6921 | +| 15 | sexual activity | 333 | 347 | 2 | Sexual_Activity | 0.62915 | +| 16 | monogamous | 368 | 377 | 2 | Sexual_Activity | 0.6915 | +| 17 | his | 382 | 384 | 2 | Gender | 0.9883 | +| 18 | his | 404 | 406 | 2 | Gender | 0.978 | +| 19 | wife | 408 | 411 | 2 | Family_Member | 0.9833 | +| 20 | immigrant | 432 | 440 | 3 | Population_Group | 0.9974 | +| 21 | English | 453 | 459 | 3 | Language | 0.9979 | +| 22 | He | 483 | 484 | 4 | Gender | 0.9996 | +| 23 | difficulty accessing healthcare | 495 | 525 | 4 | Access_To_Care | 0.3998 | +| 24 | medical insurance | 542 | 558 | 4 | Insurance_Status | 0.6721 | +| 25 | He | 561 | 562 | 5 | Gender | 0.9996 | +| 26 | herniated disc | 570 | 583 | 5 | Other_Disease | 0.71515 | +| 27 | hypertension | 586 | 597 | 5 | Hypertension | 0.9984 | +| 28 | coronary artery disease | 600 | 622 | 5 | Other_Disease | 0.847933 | +| 29 | CAD | 625 | 627 | 5 | Other_Disease | 0.9884 | +| 30 | diabetes mellitus | 634 | 650 | 5 | Other_Disease | 0.81115 | +| 31 | manic disorder | 671 | 684 | 6 | Mental_Health | 0.7929 | +| 32 | psychotic | 700 | 708 | 6 | Mental_Health | 0.9743 | +| 33 | impulsive behavior | 720 | 737 | 6 | Mental_Health | 0.41135 | +| 34 | He | 740 | 741 | 7 | Gender | 0.9996 | +| 35 | disabled | 752 | 759 | 7 | Disability | 0.9999 | + +# Assertıon_Result: + +| | chunks | entities | assertion | +|---:|:--------------------------------|:------------------|:------------| +| 0 | anxiety | Mental_Health | Present | +| 1 | depression | Mental_Health | Present | +| 2 | quality of life | Quality_Of_Life | Present | +| 3 | violence | Violence_Or_Abuse | Past | +| 4 | abuse | Violence_Or_Abuse | Past | +| 5 | smoking | Smoking | Present | +| 6 | alcohol use | Alcohol | Present | +| 7 | mental health struggles | Mental_Health | Present | +| 8 | substance use | Substance_Use | Absent | +| 9 | sexual activity | Sexual_Activity | Present | +| 10 | monogamous | Sexual_Activity | Absent | +| 11 | difficulty accessing healthcare | Access_To_Care | Absent | +| 12 | medical insurance | Insurance_Status | Present | +| 13 | hypertension | Hypertension | Present | +| 14 | manic disorder | Mental_Health | Present | +| 15 | psychotic | Mental_Health | Present | +| 16 | impulsive behavior | Mental_Health | Present | + + +# RE Result + +| | sentence | entity1_begin | entity1_end | chunk1 | entity1 | entity2_begin | entity2_end | chunk2 | entity2 | relation | confidence | +|---:|-----------:|----------------:|--------------:|:------------|:------------------|----------------:|--------------:|:------------------------|:----------------|:--------------------------------|-------------:| +| 0 | 0 | 47 | 53 | anxiety | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 1 | 0 | 59 | 68 | depression | Mental_Health | 101 | 115 | quality of life | Quality_Of_Life | Mental_Health-Quality_Of_Life | 1 | +| 2 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 3 | 1 | 171 | 178 | violence | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 4 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 246 | 256 | alcohol use | Alcohol | Violence_Or_Abuse-Alcohol | 1 | +| 5 | 1 | 184 | 188 | abuse | Violence_Or_Abuse | 270 | 292 | mental health struggles | Mental_Health | Violence_Or_Abuse-Mental_Health | 1 | +| 6 | 1 | 237 | 243 | smoking | Smoking | 270 | 292 | mental health struggles | Mental_Health | Smoking-Mental_Health | 1 | +| 7 | 1 | 246 | 256 | alcohol use | Alcohol | 270 | 292 | mental health struggles | Mental_Health | Alcohol-Mental_Health | 1 | +| 8 | 3 | 432 | 440 | immigrant | Population_Group | 453 | 459 | English | Language | Population_Group-Language | 1 | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|explain_clinical_doc_sdoh_small| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- AssertionDLModel +- PerceptronModel +- DependencyParserModel +- GenericREModel diff --git a/docs/_posts/bugeki/2024-10-02-icd10cm_chronic_indicator_mapper_en.md b/docs/_posts/bugeki/2024-10-02-icd10cm_chronic_indicator_mapper_en.md new file mode 100644 index 0000000000..b753d56c9d --- /dev/null +++ b/docs/_posts/bugeki/2024-10-02-icd10cm_chronic_indicator_mapper_en.md @@ -0,0 +1,214 @@ +--- +layout: model +title: Mapping ICD10CM Codes To Chronic Indicators +author: John Snow Labs +name: icd10cm_chronic_indicator_mapper +date: 2024-10-02 +tags: [licensed, en, clinical, mapping, icd10cm, chronic, chronic_indicator] +task: Chunk Mapping +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.0 +supported: true +annotator: ChunkMapperModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This mapper model links ICD-10-CM codes to their corresponding chronicity indicators. +The `chronic indicator` can have three different values; + +- `0`: "not chronic" +- `1`: "chronic" +- `9`: "no determination" + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/icd10cm_chronic_indicator_mapper_en_5.4.1_3.0_1727865931204.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/icd10cm_chronic_indicator_mapper_en_5.4.1_3.0_1727865931204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +clinical_ner = MedicalNerModel.pretrained("ner_clinical","en","clinical/models")\ + .setInputCols(["sentence","token","embeddings"])\ + .setOutputCol("clinical_ner") + +clinical_ner_converter = NerConverterInternal()\ + .setInputCols(["sentence","token","clinical_ner"])\ + .setOutputCol("clinical_ner_chunk")\ + .setWhiteList(['PROBLEM']) + +chunk2doc = Chunk2Doc() \ + .setInputCols("clinical_ner_chunk") \ + .setOutputCol("doc_chunk") + +sbiobert_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ + .setInputCols(["doc_chunk"])\ + .setOutputCol("sbert_embeddings")\ + .setCaseSensitive(False) + +icd_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm_augmented_billable_hcc","en", "clinical/models") \ + .setInputCols(["sbert_embeddings"]) \ + .setOutputCol("icd10cm")\ + .setDistanceFunction("EUCLIDEAN") + +doc2chunk = Doc2Chunk()\ + .setInputCols(['icd10cm'])\ + .setOutputCol('chunk') + +mapperModel = ChunkMapperModel.pretrained("icd10cm_chronic_indicator_mapper","en", "clinical/models")\ + .setInputCols(["chunk"])\ + .setOutputCol("chronic_indicator_mapping")\ + .setRels(["chronic_indicator"]) + +pipeline = Pipeline( + stages=[ + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + clinical_ner, + clinical_ner_converter, + chunk2doc, + sbiobert_embeddings, + icd_resolver, + doc2chunk, + mapperModel + ]) + +data = spark.createDataFrame([["""A 42-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with besity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."""]]).toDF("text") + +result = pipeline.fit(data).transform(data) + +``` +```scala + +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("clinical_ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "clinical_ner")) + .setOutputCol("clinical_ner_chunk") + .setWhiteList((Array("PROBLEM")) + +val chunk2doc = new Chunk2Doc() + .setInputCols("clinical_ner_chunk") + .setOutputCol("doc_chunk") + +val sbiobert_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ + .setInputCols(["doc_chunk"]) + .setOutputCol("sbert_embeddings") + .setCaseSensitive(False) + +val icd_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm_augmented_billable_hcc","en", "clinical/models") \ + .setInputCols(["sbert_embeddings"]) + .setOutputCol("icd10cm") + .setDistanceFunction("EUCLIDEAN") + +val doc2chunk = new Doc2Chunk() + .setInputCols(["icd10cm"]) + .setOutputCol("chunk") + +val mapperModel = ChunkMapperModel.pretrained("icd10cm_chronic_indicator_mapper","en", "clinical/models")\ + .setInputCols(["chunk"]) + .setOutputCol("chronic_indicator_mapping") + .setRels(["chronic_indicator"]) + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + clinical_ner, + clinical_ner_converter, + chunk2doc, + sbiobert_embeddings, + icd_resolver, + doc2chunk, + mapperModel)) + +val data = Seq("""A 42-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with besity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection.""").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) + +``` +
+ +## Results + +```bash + ++-----------+-----+---+-------------------------------------+-------+-------+------------------------------------------------------------------------------------------+-----------------+ +|sentence_id|begin|end| entity| label|icd10cm| resolution|chronic_indicator| ++-----------+-----+---+-------------------------------------+-------+-------+------------------------------------------------------------------------------------------+-----------------+ +| 0| 39| 67| gestational diabetes mellitus|PROBLEM| O24.4| gestational diabetes mellitus [gestational diabetes mellitus]| 0| +| 0| 117|153|subsequent type two diabetes mellitus|PROBLEM| O24.11|pre-existing type 2 diabetes mellitus [pre-existing type 2 diabetes mellitus, in pregna...| 1| +| 0| 172|178| obesity|PROBLEM| E66.9| obesity [obesity, unspecified]| 1| +| 0| 185|201| a body mass index|PROBLEM| Z68.41| finding of body mass index [body mass index [bmi] 40.0-44.9, adult]| 9| +| 0| 261|268| polyuria|PROBLEM| R35| polyuria [polyuria]| 0| +| 0| 271|280| polydipsia|PROBLEM| R63.1| polydipsia [polydipsia]| 0| +| 0| 283|295| poor appetite|PROBLEM| R63.0| poor appetite [anorexia]| 0| +| 0| 302|309| vomiting|PROBLEM| R11.1| vomiting [vomiting]| 0| +| 1| 403|431| a respiratory tract infection|PROBLEM| J98.8| respiratory tract infection [other specified respiratory disorders]| 0| ++-----------+-----+---+-------------------------------------+-------+-------+------------------------------------------------------------------------------------------+-----------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icd10cm_chronic_indicator_mapper| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[chunk]| +|Output Labels:|[mappings]| +|Language:|en| +|Size:|1.0 MB| diff --git a/docs/_posts/yigitgull/2024-09-29-email_matcher_en.md b/docs/_posts/yigitgull/2024-09-29-email_matcher_en.md new file mode 100644 index 0000000000..a482c64e64 --- /dev/null +++ b/docs/_posts/yigitgull/2024-09-29-email_matcher_en.md @@ -0,0 +1,108 @@ +--- +layout: model +title: Email Regex Matcher +author: John Snow Labs +name: email_matcher +date: 2024-09-29 +tags: [en, licensed, clinical, email, regexmatcher] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 5.4.1 +spark_version: 3.0 +supported: true +annotator: RegexMatcherInternalModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/email_matcher_en_5.4.1_3.0_1727618994803.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/email_matcher_en_5.4.1_3.0_1727618994803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +email_regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models") \ + .setInputCols(["document"])\ + .setOutputCol("EMAIL")\ + +email_regex_matcher_pipeline = Pipeline( + stages=[ + documentAssembler, + email_regex_matcher + ]) + +data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com . + E-mail: Mira.Gabriel.Terry@gmail.com."""]]).toDF("text") + + +email_regex_matcher_model = email_regex_matcher_pipeline.fit(data) +result = email_regex_matcher_model.transform(data) + +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val email_regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models") + .setInputCols(Array("document")) + .setOutputCol("EMAIL") + +val email_regex_pipeline = new Pipeline().setStages(Array( + documentAssembler, + email_regex_matcher + )) + +val data = Seq("""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com . + E-mail: Mira.Gabriel.Terry@gmail.com.""").toDF("text") + +val result = email_regex_pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++----------------------------+-----+---+-----+ +|chunk |begin|end|label| ++----------------------------+-----+---+-----+ +|info@domain.net |72 |86 |EMAIL| +|tech@support.org |95 |110|EMAIL| +|hale@gmail.com |121 |134|EMAIL| +|Mira.Gabriel.Terry@gmail.com|147 |174|EMAIL| ++----------------------------+-----+---+-----+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_matcher| +|Compatibility:|Healthcare NLP 5.4.1+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[EMAIL]| +|Language:|en| +|Size:|2.3 KB| \ No newline at end of file