Spacy LLM NER fails on repeated entities. This is a big problem. #260
Replies: 8 comments 4 replies
-
Hi @innocent-charles, thanks for reporting this! We are aware of the problem and currently working on improvements to the parsing. I can't give you an exact date as to when this will be published, but my current estimation is that next week we'll release v0.5.0 which should fix this issue. |
Beta Was this translation helpful? Give feedback.
-
Thanks @rmitsch. Then there is another problem that when extracting or recognizing entities from a document, GPT-3.5 sometimes incorporates its own interpretation and presents the answer accordingly. Consequently, in such cases, the doc. ents method fails to give out these kinds of entities from GPT-3.5 since they do not match the appearance in the original document. Example : Spacy LLM Logger's output. The results returned by GPT 3.5 Working Start_Date_Org: July 2015 ///// Here GPT 3.5 returned the entity exactly as required even though it is not what appears in the document. The outputs after doc.ents :
Therefore the problem is , when results are returned by GPT 3.5 and do not match as they appeared in the document, then doc. ents method in spacy fails to show such results to the user. This is a problem since GPT 3.5 is capable and normally adds its creativity to understanding documents like what I have shown above. What I think might be a solution is, if we're building a pipeline to integrate LLM's ability to Spacy framework, it's better to have another way of taking results/responses from LLMs and not "Doc. ents" implementation. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, @rmitsch, I appreciate this helpful clarification. Thank you once again. Let me try to work on it too. |
Beta Was this translation helpful? Give feedback.
-
Short update: this has been fixed in our |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @rmitsch , for the update. |
Beta Was this translation helpful? Give feedback.
-
Hello @rmitsch , has this been fixed on this newly released version? |
Beta Was this translation helpful? Give feedback.
-
Now i got this, File "pydantic/main.py", line 342, in pydantic.main.BaseModel.init |
Beta Was this translation helpful? Give feedback.
-
Hello ! @rmitsch , i have upgraded to version 0.5.1 and changed the task to spacy.NER.v3 . But , nothing has changed I end up got the error like File "pydantic/main.py", line 342, in pydantic.main.BaseModel.init Please help out ! |
Beta Was this translation helpful? Give feedback.
-
I have used spacy LLM NER for quite some time, and it fails on repeated entities. I logged in using spacy_llm.logger and I discovered that the results are returned well by OpenAI GPT 3.5 as I expected even for repeated entities. So the problem is taking those outputs returned to the doc. ents in Spacy NER, Spacy NER doc. ents do not return the repeated entities.
Example for the case of extracting work experience in resumes:
let's consider :
work experience one from resume: From July 2019 up to August 2019 volunteering at VSO INTERNATIONAL in VIJANA NA AJIRA
work experience two from resume: PROJECT ZANZIBAR
from January 2018 up to October 2018 volunteering at Buguruni health center as health secretary
When i logg, the GPT 3.5 did pretty good job:
///The output from spacy_llm.logger
Working Start_Date_Org_ONE: July 2019
Working End_Date_Org_ONE: August 2019
Working Position_Type_Org_ONE: volunteering
Organization Name_Org_ONE: VSO INTERNATIONAL
Working Start_Date_Org_TWO: January 2018
Working End_Date_Org_TWO: October 2018
Working Position_Type_Org_TWO: volunteering
Organization Name_Org_TWO: Buguruni health center
///The output of Spacy NER LLM
But Spacy NER LLM doc.ents does do not :
The above shows that doc. ents does not return Working Position_Type_Org_TWO entities because the same ent.text has already been returned above by Working Position_Type_Org_ONE.
So, how to solve this problem ?... LLMs are pretty good but the framework did the job differently.....any ideas please?
Beta Was this translation helpful? Give feedback.
All reactions