-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Support for English notes? #374
Comments
Hi @ruiouyangVA, thanks for your interest in our library! Your approach makes sens. While these components were originally designed with French in mind, many should work across most Latin languages, including English. Have you tried the For components like negation, parenthood, and hypothesis detection, they are adaptations of the NegEx and ConText algorithms. Translating the patterns should yield good results. If you make any pattern adjustments for your langage, we’d be happy to integrate them! We don’t yet have a formal multilingual API, but we’re open to exploring solutions. One of the problems I see is that we don't have access to the clinical reports in English, so we wouldn't be able to check the changes made to the non-French patterns. Regardless of the package you end up using, how do you plan to validate your extraction pipeline ? |
Hello @percevalw, thanks for the sanity check and encouragement! I work with a subset of pathology data specific to prostate cancer, so I can't test the Thank you for the clear documentation btw, it's very handy! We do have ground truth data labelled by clinical collaborators, which will form our validation set. If the EDS team would like English clinical reports to check any changes, I would need to ask about how that is usually handled. Lots of work to be done and not enough people as usual :') but hopefully I can build on the EDS work and make it faster in the future to add / debug / improve our extractors. |
Would it be difficult to adapt EDS-NLP for extracting custom named entities for clinical notes in English?
I would want a component along the lines of https://aphp.github.io/edsnlp/latest/pipes/ner/scores/charlson/.
(If this not a recommended idea let me know too -- and pointers in the right direction appreciated. Presumably I could build something off of medspacy)
What would need to be changed? I assume the tokenizers at
And potentially the patterns at
As well as the resources at
The code architecture is very clean and a lot of modifications (eg detecting sentence boundaries with newlines) make a lot of sense. Also I am one person and reinventing the wheel seems like a lot of work ...
Thanks!
The text was updated successfully, but these errors were encountered: