48 new Transformer based models in 9 new languages, including NER for Finance, Industry, Politcal Policies, COVID and Chemical Trials, various clinical and medical domains in Spanish and English and much more in NLU 3.3.1
We are incredibly excited to announce NLU 3.3.1 has been released with 48 new models in 9 languages!
It comes with 2 new types of state-of-the-art models,distilBERT
and BERT for sequence classification
with various pre-trained weights,
state-of-the-art bert based classifiers for problems in the domains of Finance
, Sentiment Classification
, Industry
, News
, and much more.
On the healthcare side, NLU features 22 new models in for English and Spanish with
with entity Resolver Models
for LOINC, MeSH, NDC and SNOMED and UMLS Diseases,
NER models
for Biomarkers
, NIHSS-Guidelines
, COVID Trials
, Chemical Trials
,
Bert based Token Classifier models
for biological
, genetical
,cancer
, cellular
terms,
Bert for Sequence Classification models
for clinical question vs statement classification
and finally Spanish Clinical NER
and Resolver Models
Once again, we would like to thank our community for making another amazing release possible!
New Open Source Models and Features
Integrates the amazing Spark NLP 3.3.3 and 3.3.2 releases, featuring:
- New state-of-the-art fine-tuned
BERT models for Sequence Classification
inEnglish
,French
,German
,Spanish
,Japanese
,Turkish
,Russian
, and multilingual languages. DistilBertForSequenceClassification
models inEnglish
,French
andUrdu
Word2Vec
models.classify.distilbert_sequence.banking77
:Banking NER model
trained on BANKING77 dataset, which provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. Can extract entities like activate_my_card, age_limit, apple_pay_or_google_pay, atm_support, automatic_top_up, balance_not_updated_after_bank_transfer, balance_not_updated_after_cheque_or_cash_deposit, beneficiary_not_allowed, cancel_transfer, card_about_to_expire, card_acceptance, card_arrival, card_delivery_estimate, card_linking, card_not_working, card_payment_fee_charged, card_payment_not_recognised, card_payment_wrong_exchange_rate, card_swallowed, cash_withdrawal_charge, cash_withdrawal_not_recognised, change_pin, compromised_card, contactless_not_working, country_support, declined_card_payment, declined_cash_withdrawal, declined_transfer, direct_debit_payment_not_recognised, disposable_card_limits, edit_personal_details, exchange_charge, exchange_rate, exchange_via_app, extra_charge_on_statement, failed_transfer, fiat_currency_support, get_disposable_virtual_card, get_physical_card, getting_spare_card, getting_virtual_card, lost_or_stolen_card, lost_or_stolen_phone, order_physical_card, passcode_forgotten, pending_card_payment, pending_cash_withdrawal, pending_top_up, pending_transfer, pin_blocked, receiving_money,classify.distilbert_sequence.industry
:Industry NER model
which can extract entities like Advertising, Aerospace & Defense, Apparel Retail, Apparel, Accessories & Luxury Goods, Application Software, Asset Management & Custody Banks, Auto Parts & Equipment, Biotechnology, Building Products, Casinos & Gaming, Commodity Chemicals, Communications Equipment, Construction & Engineering, Construction Machinery & Heavy Trucks, Consumer Finance, Data Processing & Outsourced Services, Diversified Metals & Mining, Diversified Support Services, Electric Utilities, Electrical Components & Equipment, Electronic Equipment & Instruments, Environmental & Facilities Services, Gold, Health Care Equipment, Health Care Facilities, Health Care Services.xx.classify.bert_sequence.sentiment
:Multi-Lingual Sentiment Classifier
This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5). This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.distilbert_sequence.policy
:Policy Classifier
This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.classify.bert_sequence.dehatebert_mono
:Hate Speech Classifier
This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.
Complete List of Open Source Models :
New Healthcare models and Features
Integrates the incredible Spark NLP for Healthcare releases 3.3.4, 3.3.2 and 3.3.1, featuring:
- New Clinical NER Models for protected health information(PHI),
ner_biomarker
for extracting extract biomarkers, therapies, oncological, and other general concepts- Oncogenes, Tumor_Finding, UnspecificTherapy, Ethnicity, Age, ResponseToTreatment, Biomarker, HormonalTherapy, Staging, Drug, CancerDx, Radiotherapy, CancerSurgery, TargetedTherapy, PerformanceStatus, CancerModifier, Radiological_Test_Result, Biomarker_Measurement, Metastasis, Radiological_Test, Chemotherapy, Test, Dosage, Test_Result, Immunotherapy, Date, Gender, Prognostic_Biomarkers, Duration, Predictive_Biomarkers
ner_nihss
: NER model that can identify entities according to NIHSS guidelines for clinical stroke assessment to evaluate neurological status in acute stroke patients- 11_ExtinctionInattention, 6b_RightLeg, 1c_LOCCommands, 10_Dysarthria, NIHSS, 5_Motor, 8_Sensory, 4_FacialPalsy, 6_Motor, 2_BestGaze, Measurement, 6a_LeftLeg, 5b_RightArm, 5a_LeftArm, 1b_LOCQuestions, 3_Visual, 9_BestLanguage, 7_LimbAtaxia, 1a_LOC .
redl_nihss_biobert
: relation extraction model that can relate scale items and their measurements according to NIHSS guidelines.es.med_ner.roberta_ner_diag_proc
: New Spanish Clinical NER Models for extracting the entities DIAGNOSTICO, PROCEDIMIENTOes.resolve.snomed
: New Spanish SNOMED Entity Resolversbert_sequence_classifier_question_statement_clinical
:New Clinical Question vs Statement for BertForSequenceClassification modelmed_ner.covid_trials
: This model is trained to extract covid-specific medical entities in clinical trials. It supports the following entities ranging from virus type to trial design: Stage, Severity, Virus, Trial_Design, Trial_Phase, N_Patients, Institution, Statistical_Indicator, Section_Header, Cell_Type, Cellular_component, Viral_components, Physiological_reaction, Biological_molecules, Admission_Discharge, Age, BMI, Cerebrovascular_Disease, Date, Death_Entity, Diabetes, Disease_Syndrome_Disorder, Dosage, Drug_Ingredient, Employment, Frequency, Gender, Heart_Disease, Hypertension, Obesity, Pulse, Race_Ethnicity, Respiration, Route, Smoking, Time, Total_Cholesterol, Treatment, VS_Finding, Vaccine .med_ner.chemd
: This model extract the names of chemical compounds and drugs in medical texts. The entities that can be detected are as follows : SYSTEMATIC, IDENTIFIERS, FORMULA, TRIVIAL, ABBREVIATION, FAMILY, MULTIPLE . For reference click here . https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331685/bert_token_classifier_ner_bionlp
: This model is BERT-based version of ner_bionlp model and can detect biological and genetics terms in cancer-related texts. (Amino_acid, Anatomical_system, Cancer, Cell, Cellular_component, Developing_anatomical_Structure, Gene_or_gene_product, Immaterial_anatomical_entity, Multi-tissue_structure, Organ, Organism, Organism_subdivision, Simple_chemical, Tissuebert_token_classifier_ner_cellular
: This model is BERT-based version of ner_cellular model and can detect molecular biology-related terms (DNA, Cell_type, Cell_line, RNA, Protein) in medical texts.- We have updated
med_ner.jsl.enriched
model by enriching the training data using clinical trials data to make it more robust. This model is capable of predicting up to 87 different entities and is based on ner_jsl model. Here are the entities this model can detect; Social_History_Header, Oncology_Therapy, Blood_Pressure, Respiration, Performance_Status, Family_History_Header, Dosage, Clinical_Dept, Diet, Procedure, HDL, Weight, Admission_Discharge, LDL, Kidney_Disease, Oncological, Route, Imaging_Technique, Puerperium, Overweight, Temperature, Diabetes, Vaccine, Age, Test_Result, Employment, Time, Obesity, EKG_Findings, Pregnancy, Communicable_Disease, BMI, Strength, Tumor_Finding, Section_Header, RelativeDate, ImagingFindings, Death_Entity, Date, Cerebrovascular_Disease, Treatment, Labour_Delivery, Pregnancy_Delivery_Puerperium, Direction, Internal_organ_or_component, Psychological_Condition, Form, Medical_Device, Test, Symptom, Disease_Syndrome_Disorder, Staging, Birth_Entity, Hyperlipidemia, O2_Saturation, Frequency, External_body_part_or_region, Drug_Ingredient, Vital_Signs_Header, Substance_Quantity, Race_Ethnicity, VS_Finding, Injury_or_Poisoning, Medical_History_Header, Alcohol, Triglycerides, Total_Cholesterol, Sexually_Active_or_Sexual_Orientation, Female_Reproductive_Status, Relationship_Status, Drug_BrandName, RelativeTime, Duration, Hypertension, Metastasis, Gender, Oxygen_Therapy, Pulse, Heart_Disease, Modifier, Allergen, Smoking, Substance, Cancer_Modifier, Fetus_NewBorn, Height classify.bert_sequence.question_statement_clinical
: This model classifies sentences into one of these two classes: question (interrogative sentence) or statement (declarative sentence) and trained with BertForSequenceClassification. This model is at first trained on SQuAD and SPAADIA dataset and then fine tuned on the clinical visit documents and MIMIC-III dataset annotated in-house. Using this model, you can find the question statements and exclude & utilize in the downstream tasks such as NER and relation extraction models.classify.token_bert.ner_chemical
: This model is BERT-based version of ner_chemicals model and can detect chemical compounds (CHEM) in the medical texts.resolve.umls_disease_syndrome
: This model is trained on the Disease or Syndrome category using sbiobert_base_cased_mli embeddings.