-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predictions in raw data #5
Comments
Hi @GuillermoJaca, what do you mean by the raw data? I think the pre-processing will depend on the type of task you want. |
I mean a normal biomedical text. The issue is that there is no .predict function, so the file run_ner.py has to be customized. What is the best way to do that? Which preprocessing should I use to get the best possible performance of the model taking into account that my task is NER ? |
Instruction on using the repo for inference is in the README under the NER section: https://github.com/dmis-lab/biobert#user-content-named-entity-recognition-ner:~:text=You%20can%20change%20the%20arguments%20as,using%20%2D%2Ddo_train%3Dfalse%20%2D%2Ddo_predict%3Dtrue%20for%20evaluating%20test.tsv. The bigger challenge is completing inference without using the repo, ie, repo specific functions and methods. |
@GuillermoJaca for prediction you can directly use your fine tune model in huggingface transformer pipeline, some sample code below for you reference:
Read more here on huggingface pipeline: |
@abhibisht89 However, if tokenizer is specified as 'dmis-lab/biobert-v1.1', the ignore_subwords option cannot be specified as True. Is there any other way? |
Hello, I wonder why the labels are the simple BIO in NER task, however, in the raw dataset (e.g. NCBI), the labels could be SpecificDisease, Modifier and so on. |
Hello, I am wondering how predictions on raw data can be done. It is not documented at all for this and I think it's the primary use of the model.
The text was updated successfully, but these errors were encountered: