Predictions in raw data #5

GuillermoJaca · 2020-12-01T13:34:34Z

Hello, I am wondering how predictions on raw data can be done. It is not documented at all for this and I think it's the primary use of the model.

jhyuklee · 2020-12-03T02:07:07Z

Hi @GuillermoJaca, what do you mean by the raw data? I think the pre-processing will depend on the type of task you want.

GuillermoJaca · 2020-12-03T06:06:24Z

I mean a normal biomedical text. The issue is that there is no .predict function, so the file run_ner.py has to be customized. What is the best way to do that? Which preprocessing should I use to get the best possible performance of the model taking into account that my task is NER ?

mgavish · 2020-12-11T19:26:30Z

Instruction on using the repo for inference is in the README under the NER section: https://github.com/dmis-lab/biobert#user-content-named-entity-recognition-ner:~:text=You%20can%20change%20the%20arguments%20as,using%20%2D%2Ddo_train%3Dfalse%20%2D%2Ddo_predict%3Dtrue%20for%20evaluating%20test.tsv.

The bigger challenge is completing inference without using the repo, ie, repo specific functions and methods.

abhibisht89 · 2021-01-06T11:18:34Z

@GuillermoJaca for prediction you can directly use your fine tune model in huggingface transformer pipeline, some sample code below for you reference:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("finetue_model_path")
model = AutoModelForTokenClassification.from_pretrained("finetue_model_path")
nlp=pipeline(task='ner',model=model,tokenizer=tokenizer,grouped_entities=True,ignore_subwords=True)
text="""he is feeing very sick"""
output=nlp(text)

Read more here on huggingface pipeline:
https://huggingface.co/transformers/main_classes/pipelines.html

nowhyun · 2021-01-11T07:56:10Z

@abhibisht89
Thank you for your reply.

However, if tokenizer is specified as 'dmis-lab/biobert-v1.1', the ignore_subwords option cannot be specified as True.

Is there any other way?

cutejue · 2021-03-29T02:09:49Z

Hello, I wonder why the labels are the simple BIO in NER task, however, in the raw dataset (e.g. NCBI), the labels could be SpecificDisease, Modifier and so on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions in raw data #5

Predictions in raw data #5

GuillermoJaca commented Dec 1, 2020

jhyuklee commented Dec 3, 2020

GuillermoJaca commented Dec 3, 2020

mgavish commented Dec 11, 2020 •

edited

Loading

abhibisht89 commented Jan 6, 2021

nowhyun commented Jan 11, 2021

cutejue commented Mar 29, 2021

Predictions in raw data #5

Predictions in raw data #5

Comments

GuillermoJaca commented Dec 1, 2020

jhyuklee commented Dec 3, 2020

GuillermoJaca commented Dec 3, 2020

mgavish commented Dec 11, 2020 • edited Loading

abhibisht89 commented Jan 6, 2021

nowhyun commented Jan 11, 2021

cutejue commented Mar 29, 2021

mgavish commented Dec 11, 2020 •

edited

Loading