diff --git a/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_frailty_vulnerability_en.md b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_frailty_vulnerability_en.md new file mode 100644 index 0000000000..cdaa1d1797 --- /dev/null +++ b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_frailty_vulnerability_en.md @@ -0,0 +1,151 @@ +--- +layout: model +title: Social Determinants of Healthcare for Frailty and Vulnerability Classifier +author: John Snow Labs +name: bert_sequence_classifier_sdoh_frailty_vulnerability +date: 2023-12-20 +tags: [sdoh, en, clinical, social_determinants_of_heathcare, public_health, frailty, vulnerability, licensed, tensorflow] +task: Text Classification +language: en +edition: Healthcare NLP 5.1.4 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The Fraitly classifier employs [MedicalBertForSequenceClassification embeddings](https://sparknlp.org/2022/07/18/biobert_pubmed_base_cased_v1.2_en_3_0.html) within a robust classifier architecture. Trained on a diverse dataset, this model provides accurate label assignments and confidence scores for its predictions. The primary goal of this model is to categorize text into two key labels: `Frailty_Vulnerability` and `No_Or_Unknown`. + +- `Frailty_Vulnerability`: This category includes statements that highlight concerns, signs or symptoms associated with frailty and/or vulnerability conditions. + +- `No_Or_Unknown`: This category encompasses statements that either do not present any identifiable concerns related to frailty/vulnerability or where the presence or extent of frailty/ vulnerability is indeterminate. + +## Predicted Entities + +`Frailty_Vulnerability`, `No_Or_Unknown` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/SOCIAL_DETERMINANT_SEQUENCE_CLASSIFICATION/){:.button.button-orange} +[Open in Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_CLASSIFICATION.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_frailty_vulnerability_en_5.1.4_3.0_1703084816116.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_frailty_vulnerability_en_5.1.4_3.0_1703084816116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols(["document"])\ + .setOutputCol("token") + +sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_frailty_vulnerability", "en", "clinical/models")\ + .setInputCols(["document","token"])\ + .setOutputCol("prediction") + +pipeline = Pipeline( + stages=[ + document_assembler, + tokenizer, + sequenceClassifier + ]) + +sample_texts = [ + ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."], + ["Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging showed no signs of local recurrence or distant metastasis. Whereas the recovery was challenging, current evaluation confirms patient is in remission."], + ["The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen that includes both chemotherapy and radiation therapy."], + ["Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytology results indicated no malignancy, consistent with a benign thyroid adenoma. However, patient is advised for a follow-up ultrasound in 12 months to monitor nodule size."], + ["The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS."], + ["Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; however, CA-125 levels are within normal range, and repeat imaging has shown consistent cyst size. No features of ovarian cancer were present, and a follow-up is scheduled in six months."] + ] + +sample_data = spark.createDataFrame(sample_texts).toDF("text") + +result = pipeline.fit(sample_data).transform(sample_data) + +result.select("text", "prediction.result").show(truncate=100) +``` +```scala +val documenter = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_frailty_vulnerability", "en", "clinical/models") + .setInputCols(Array("document","token")) + .setOutputCol("prediction") + +val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier)) + +val data = Seq(Array("Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.", + "Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging showed no signs of local recurrence or distant metastasis. Whereas the recovery was challenging, current evaluation confirms patient is in remission.", + "The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen that includes both chemotherapy and radiation therapy.", + "Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytology results indicated no malignancy, consistent with a benign thyroid adenoma. However, patient is advised for a follow-up ultrasound in 12 months to monitor nodule size.", + "The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS.", + "Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; however, CA-125 levels are within normal range, and repeat imaging has shown consistent cyst size. No features of ovarian cancer were present, and a follow-up is scheduled in six months." + )).toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++----------------------------------------------------------------------------------------------------+-----------------------+ +| text| result| ++----------------------------------------------------------------------------------------------------+-----------------------+ +|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[Frailty_Vulnerability]| +|Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging sh...| [No_Or_Unknown]| +|The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen t...|[Frailty_Vulnerability]| +|Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytolo...| [No_Or_Unknown]| +| The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS.|[Frailty_Vulnerability]| +|Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; howe...| [No_Or_Unknown]| ++----------------------------------------------------------------------------------------------------+-----------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sequence_classifier_sdoh_frailty_vulnerability| +|Compatibility:|Healthcare NLP 5.1.4+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[prediction]| +|Language:|en| +|Size:|406.4 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +Trained with the in-house dataset + +## Benchmarking + +```bash + label precision recall f1-score support +Frailty_Vulnerability 0.982014 0.971530 0.976744 281 + No_Or_Unknown 0.960976 0.975248 0.968059 202 + accuracy - - 0.973085 483 + macro-avg 0.971495 0.973389 0.972402 483 + weighted-avg 0.973216 0.973085 0.973112 483 +``` \ No newline at end of file diff --git a/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_mental_health_en.md b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_mental_health_en.md new file mode 100644 index 0000000000..e9df549e9c --- /dev/null +++ b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_mental_health_en.md @@ -0,0 +1,151 @@ +--- +layout: model +title: Social Determinants of Healthcare for Mental Health Classifier +author: John Snow Labs +name: bert_sequence_classifier_sdoh_mental_health +date: 2023-12-20 +tags: [sdoh, en, clinical, social_determinants_of_heathcare, public_health, mental_health, licensed, tensorflow] +task: Text Classification +language: en +edition: Healthcare NLP 5.1.4 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The Mental Health classifier employs [MedicalBertForSequenceClassification embeddings](https://sparknlp.org/2022/07/18/biobert_pubmed_base_cased_v1.2_en_3_0.html) within a robust classifier architecture. Trained on a diverse dataset, this model provides accurate label assignments and confidence scores for its predictions. The primary goal of this model is to categorize text into two key labels: `Mental_Disorder` and `No_Or_Not_Mentioned`. + +- `Mental_Disorder`: It encompasses a wide range of mental health conditions that affect a person's mood, thinking, behavior, and overall psychological well-being. + +- `No_Or_Not_Mentioned`: The patient doesn’t have mental health problems or it is not mentioned in the clinical notes. + +## Predicted Entities + +`Mental_Disorder`, `No_Or_Not_Mentioned` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/SOCIAL_DETERMINANT_SEQUENCE_CLASSIFICATION/){:.button.button-orange} +[Open in Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_CLASSIFICATION.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_mental_health_en_5.1.4_3.0_1703076463310.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_mental_health_en_5.1.4_3.0_1703076463310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols(["document"])\ + .setOutputCol("token") + +sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_mental_health", "en", "clinical/models")\ + .setInputCols(["document", "token"])\ + .setOutputCol("prediction") + +pipeline = Pipeline( + stages=[ + document_assembler, + tokenizer, + sequenceClassifier + ]) + +sample_texts= [ + ["John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by alternating periods of elevated mood (mania) and depression. His treatment plan involved a combination of mood stabilizing medication and regular therapy sessions. With proper management and support, John learned to better understand and cope with his condition, leading to improved stability and overall well-being."], + ["Lisa, a 28-year-old woman, was diagnosed with generalized anxiety disorder (GAD), a mental disorder characterized by excessive worry and persistent anxiety."], + ["Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity disorder (ADHD), a neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. After a comprehensive evaluation, Mark was diagnosed with ADHD, and his healthcare provider recommended a multimodal treatment approach. "], + ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."], + ["She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had no signs of a mental disorder. Her healthcare provider assessed her lung function, reviewed her medication regimen, and provided personalized asthma education. "], + ["During the appointment, her healthcare provider assessed her joint function, reviewed her medication regimen, and discussed the importance of adherence. They also discussed the benefits of regular exercise, maintaining a healthy weight, and using assistive devices when needed to support Anna's joint health. "], +] + +sample_data = spark.createDataFrame(sample_texts).toDF("text") + +result = pipeline.fit(sample_data).transform(sample_data) + +result.select("text", "prediction.result").show(truncate=100) +``` +```scala +val documenter = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_mental_health", "en", "clinical/models") + .setInputCols(Array("document", "token")) + .setOutputCol("prediction") + +val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier)) + +val data = Seq(Array("John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by alternating periods of elevated mood (mania) and depression. His treatment plan involved a combination of mood stabilizing medication and regular therapy sessions. With proper management and support, John learned to better understand and cope with his condition, leading to improved stability and overall well-being.", + "Lisa, a 28-year-old woman, was diagnosed with generalized anxiety disorder (GAD), a mental disorder characterized by excessive worry and persistent anxiety.", + "Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity disorder (ADHD), a neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. After a comprehensive evaluation, Mark was diagnosed with ADHD, and his healthcare provider recommended a multimodal treatment approach. ", + "Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.", + "She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had no signs of a mental disorder. Her healthcare provider assessed her lung function, reviewed her medication regimen, and provided personalized asthma education. ", + "During the appointment, her healthcare provider assessed her joint function, reviewed her medication regimen, and discussed the importance of adherence. They also discussed the benefits of regular exercise, maintaining a healthy weight, and using assistive devices when needed to support Anna's joint health. ", +)).toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++----------------------------------------------------------------------------------------------------+---------------------+ +| text| result| ++----------------------------------------------------------------------------------------------------+---------------------+ +|John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by ...| [Mental_Disorder]| +|Lisa, a 28-year-old woman, was diagnosed with generalized anxiety disorder (GAD), a mental disord...| [Mental_Disorder]| +|Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity diso...| [Mental_Disorder]| +|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[No_Or_Not_Mentioned]| +|She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had n...|[No_Or_Not_Mentioned]| +|During the appointment, her healthcare provider assessed her joint function, reviewed her medicat...|[No_Or_Not_Mentioned]| ++----------------------------------------------------------------------------------------------------+---------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sequence_classifier_sdoh_mental_health| +|Compatibility:|Healthcare NLP 5.1.4+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[prediction]| +|Language:|en| +|Size:|406.4 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +Trained with the in-house dataset + +## Benchmarking + +```bash + label precision recall f1-score support + Mental_Disorder 0.903226 0.845921 0.873635 331 +No_Or_Not_Mentioned 0.923653 0.953632 0.938403 647 + accuracy - - 0.917178 978 + macro-avg 0.913439 0.899777 0.906019 978 + weighted-avg 0.916739 0.917178 0.916483 978 +``` \ No newline at end of file diff --git a/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_violence_abuse_en.md b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_violence_abuse_en.md new file mode 100644 index 0000000000..a079bd39d5 --- /dev/null +++ b/docs/_posts/akrztrk/2023-12-20-bert_sequence_classifier_sdoh_violence_abuse_en.md @@ -0,0 +1,151 @@ +--- +layout: model +title: Social Determinants of Healthcare for Violence and Abuse Classifier +author: John Snow Labs +name: bert_sequence_classifier_sdoh_violence_abuse +date: 2023-12-20 +tags: [sdoh, en, clinical, social_determinants_of_heathcare, public_health, violence, abuse, licensed, tensorflow] +task: Text Classification +language: en +edition: Healthcare NLP 5.1.4 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The Violence and Abuse classifier employs [MedicalBertForSequenceClassification embeddings](https://sparknlp.org/2022/07/18/biobert_pubmed_base_cased_v1.2_en_3_0.html) within a robust classifier architecture. Trained on a diverse dataset, this model provides accurate label assignments and confidence scores for its predictions. The primary goal of this model is to categorize text into four key labels: `Domestic_Violence_Abuse`, `Personal_Violence_Abuse`, `No_Violence_Abuse` and `Unknown`. + +- `Domestic_Violence_Abuse`:This category refers to a pattern of behavior in any relationship that is aimed at gaining or maintaining power and control over an intimate partner or family member. + +- `Personal_Violence_Abuse`: This category encompasses any form of violence or abuse that is directed towards an individual, whether admitted by the perpetrator or recognized by the victim. + +- `No_Violence_Abuse`: This category denotes the complete absence of violence and abuse in any form. + +- `Unknown`: This category covers when the nature or type of violence or abuse within a given text cannot be clearly identified or defined. + +## Predicted Entities + +`Domestic_Violence_Abuse`, `Personal_Violence_Abuse`, `No_Violence_Abuse`, `Unknown` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/SOCIAL_DETERMINANT_SEQUENCE_CLASSIFICATION/){:.button.button-orange} +[Open in Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_CLASSIFICATION.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_violence_abuse_en_5.1.4_3.0_1703086100729.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_sdoh_violence_abuse_en_5.1.4_3.0_1703086100729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols(["document"])\ + .setOutputCol("token") + +sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_violence_abuse", "en", "clinical/models")\ + .setInputCols(["document","token"])\ + .setOutputCol("prediction") + +pipeline = Pipeline( + stages=[ + document_assembler, + tokenizer, + sequenceClassifier + ]) + +sample_texts = [ + ["Repeated visits for fractures, with vague explanations suggesting potential family-related trauma."], + ["Patient presents with multiple bruises in various stages of healing, suggestive of repeated physical abuse."], + ["There are no reported instances or documented episodes indicating the patient poses a risk of violence."] , + ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."] + ] + +sample_data = spark.createDataFrame(sample_texts).toDF("text") + +result = pipeline.fit(sample_data).transform(sample_data) + +result.select("text", "prediction.result").show(truncate=100) +``` +```scala +val documenter = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_violence_abuse", "en", "clinical/models") + .setInputCols(Array("document","token")) + .setOutputCol("prediction") + +val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier)) + +val data = Seq(Array("Repeated visits for fractures, with vague explanations suggesting potential family-related trauma.", + "Patient presents with multiple bruises in various stages of healing, suggestive of repeated physical abuse.", + "There are no reported instances or documented episodes indicating the patient poses a risk of violence." , + "Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.", + )).toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++----------------------------------------------------------------------------------------------------+-------------------------+ +| text| result| ++----------------------------------------------------------------------------------------------------+-------------------------+ +| Repeated visits for fractures, with vague explanations suggesting potential family-related trauma.|[Domestic_Violence_Abuse]| +|Patient presents with multiple bruises in various stages of healing, suggestive of repeated physi...|[Personal_Violence_Abuse]| +|There are no reported instances or documented episodes indicating the patient poses a risk of vio...| [No_Violence_Abuse]| +|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...| [Unknown]| ++----------------------------------------------------------------------------------------------------+-------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sequence_classifier_sdoh_violence_abuse| +|Compatibility:|Healthcare NLP 5.1.4+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[prediction]| +|Language:|en| +|Size:|406.4 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +Trained with the in-house dataset + +## Benchmarking + +```bash + label precision recall f1-score support +Domestic_Violence_Abuse 0.921687 0.905325 0.913433 169 + No_Violence_Abuse 0.978417 0.860759 0.915825 158 +Personal_Violence_Abuse 0.889908 0.858407 0.873874 226 + Unknown 0.937500 0.975610 0.956175 738 + accuracy - - 0.931836 1291 + macro-avg 0.931878 0.900025 0.914827 1291 + weighted-avg 0.932106 0.931836 0.931234 1291 +``` \ No newline at end of file diff --git a/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_en.md b/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_en.md new file mode 100644 index 0000000000..4aec107959 --- /dev/null +++ b/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_en.md @@ -0,0 +1,201 @@ +--- +layout: model +title: Bert for Sequence Classification (Clinical Documents Sections) +author: John Snow Labs +name: bert_sequence_classifier_clinical_sections +date: 2023-12-21 +tags: [clinical, section, en, licensed, tensorflow] +task: Text Classification +language: en +edition: Healthcare NLP 5.1.4 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a BERT-based model for classification of clinical documents sections. This model performs better when the section header is present in the text, e.g., when splitting the document with `ChunkSentenceSplitter` annotator with parameter `setInsertChunk=True`. + +## Predicted Entities + +`Complications and Risk Factors`, `Consultation and Referral`, `Diagnostic and Laboratory Data`, `Discharge Information`, `Habits`, `History`, `Patient Information`, `Procedures`, `Impression`, `Other` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_clinical_sections_en_5.1.4_3.0_1703164186752.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_clinical_sections_en_5.1.4_3.0_1703164186752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = nlp.Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + +sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections", "en", "clinical/models")\ + .setInputCols(["document", "token"])\ + .setOutputCol("prediction")\ + .setCaseSensitive(False) + +pipeline = nlp.Pipeline(stages=[ + document_assembler, + tokenizer, + sequenceClassifier +]) + +example_df = spark.createDataFrame( + [["""Discharge Instructions: +It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """]]).toDF("text") + + +result = spark_model.transform(example_df) +result.select("prediction.result").show(truncate=False) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained() + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val seq = BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections", "en", "clinical/models") + .setInputCols(Array("token", "sentence")) + .setOutputCol("label") + .setCaseSensitive(false) + +val pipeline = new Pipeline().setStages(Array( +documentAssembler, +sentenceDetector, +tokenizer, +seq)) + +val test_sentences = """Discharge Instructions: +It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """" + +val example = Seq(test_sentences).toDF("text") +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu + +nlu.load("en.classify.bert_sequence.clinical_sections").predict("""Discharge Instructions: +It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """) +``` +
+ +## Results + +```bash ++-----------------------+ +|result | ++-----------------------+ +|[Discharge Information]| ++-----------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sequence_classifier_clinical_sections| +|Compatibility:|Healthcare NLP 5.1.4+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.6 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +In-house annotation of clinical documents. + +## Sample text from the training dataset + +Discharge Instructions: +It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. + +## Benchmarking + +```bash + label precision recall f1-score support + Consultation_and_Referral 0.981203 0.996183 0.988636 262 + Other 1.000000 1.000000 1.000000 29 + Habits 0.983051 1.000000 0.991453 58 +Complications_and_Risk_Factors 1.000000 1.000000 1.000000 385 +Diagnostic_and_Laboratory_Data 0.987835 0.983051 0.985437 413 + Discharge_Information 0.992386 0.982412 0.987374 398 + History 1.000000 0.990099 0.995025 404 + Impression 0.997706 0.997706 0.997706 436 + Patient_Information 0.994764 0.994764 0.994764 382 + Procedures 0.984456 0.997375 0.990874 381 + accuracy - - 0.992694 3148 + macro-avg 0.992140 0.994159 0.993127 3148 + weighted-avg 0.992730 0.992694 0.992694 3148 +``` diff --git a/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_headless_en.md b/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_headless_en.md new file mode 100644 index 0000000000..276e74419c --- /dev/null +++ b/docs/_posts/akrztrk/2023-12-21-bert_sequence_classifier_clinical_sections_headless_en.md @@ -0,0 +1,194 @@ +--- +layout: model +title: Bert for Sequence Classification (Clinical Documents Sections, Headless) +author: John Snow Labs +name: bert_sequence_classifier_clinical_sections_headless +date: 2023-12-21 +tags: [clinical, sections, en, licensed, tensorflow] +task: Text Classification +language: en +edition: Healthcare NLP 5.1.4 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a BERT-based model for classification of clinical documents sections. This model is trained on clinical document sections without the section header in the text, e.g., when splitting the document with `ChunkSentenceSplitter` annotator with parameter `setInsertChunk=False`. + +## Predicted Entities + +`Consultation and Referral`, `Habits`, `Complications and Risk Factors`, `Diagnostic and Laboratory Data`, `Discharge Information`, `History`, `Impression`, `Patient Information`, `Procedures`, `Other` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_clinical_sections_headless_en_5.1.4_3.0_1703165706034.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_clinical_sections_headless_en_5.1.4_3.0_1703165706034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = nlp.Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + +sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections_headless", "en", "clinical/models")\ + .setInputCols(["document", "token"])\ + .setOutputCol("prediction")\ + .setCaseSensitive(False) + +pipeline = nlp.Pipeline(stages=[ + document_assembler, + tokenizer, + sequenceClassifier +]) + +example_df = spark.createDataFrame( + [["""It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """]]).toDF("text") + + +result = pipeline.fit(example_df).transform(example_df) +result.select("prediction.result").show(truncate=False) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained() + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val seq = BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections_headless", "en", "clinical/models") + .setInputCols(Array("token", "sentence")) + .setOutputCol("label") + .setCaseSensitive(false) + +val pipeline = new Pipeline().setStages(Array( + documentAssembler, + sentenceDetector, + tokenizer, + seq)) + +val test_sentences = """It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """" + +val example = Seq(test_sentences).toDF("text") +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu + +nlu.load("en.classify.bert_sequence.clinical_sections_headless").predict("""It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. """) +``` +
+ +## Results + +```bash ++-----------------------+ +|result | ++-----------------------+ +|[Discharge Information]| ++-----------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sequence_classifier_clinical_sections_headless| +|Compatibility:|Healthcare NLP 5.1.4+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.6 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +In-house annotation of clinical documents. + +## Sample text from the training dataset + +It was a pleasure taking care of you! You came to us with +stomach pain and worsening distension. While you were here we +did a paracentesis to remove 1.5L of fluid from your belly. We +also placed you on you 40 mg of Lasix and 50 mg of Aldactone to +help you urinate the excess fluid still in your belly. As we +discussed, everyone has a different dose of lasix required to +make them urinate and it's likely that you weren't taking a high +enough dose. Please take these medications daily to keep excess +fluid off and eat a low salt diet. You will follow up with Dr. +___ in liver clinic and from there have your colonoscopy +and EGD scheduled. + +## Benchmarking + +```bash + label precision recall f1-score support + Consultation_and_Referral 0.655949 0.890830 0.755556 229 + Other 0.954545 0.933333 0.943820 45 + Habits 0.872727 0.800000 0.834783 60 +Complications_and_Risk_Factors 0.997468 0.989950 0.993695 398 +Diagnostic_and_Laboratory_Data 0.887417 0.676768 0.767908 396 + Discharge_Information 0.792000 0.763496 0.777487 389 + History 0.873810 0.910670 0.891859 403 + Impression 0.843537 0.909535 0.875294 409 + Patient_Information 0.804569 0.786600 0.795483 403 + Procedures 0.875912 0.865385 0.870617 416 +```