Feature request: Support for English notes? #374

ruiouyangVA · 2025-02-11T17:53:07Z

Would it be difficult to adapt EDS-NLP for extracting custom named entities for clinical notes in English?
I would want a component along the lines of https://aphp.github.io/edsnlp/latest/pipes/ner/scores/charlson/.

(If this not a recommended idea let me know too -- and pointers in the right direction appreciated. Presumably I could build something off of medspacy)

What would need to be changed? I assume the tokenizers at

edsnlp/language.py
edsnlp/conjugator.py

And potentially the patterns at

/pipes/core/normalizer/pollution/patterns.py
/pipes/misc/consultation_dates/patterns.py
pipes/misc/dates/patterns/relative.py
/pipes/misc/dates/patterns/duration.py
/pipes/misc/dates/patterns/current.py
/pipes/misc/dates/patterns/absolute.py

/pipes/misc/quantities/patterns.py
/pipes/misc/reason/patterns.py
/pipes/misc/sections/patterns.py
/pipes/misc/tables/patterns.py 

/pipes/terminations.py

/pipes/qualifiers/negation/patterns.py

edsnlp/pipes/qualifiers/hypothesis/patterns.py ?
scripts/conjugate_verbs.py

As well as the resources at

edsnlp/resources/*(json|csv).gz

The code architecture is very clean and a lot of modifications (eg detecting sentence boundaries with newlines) make a lot of sense. Also I am one person and reinventing the wheel seems like a lot of work ...

Thanks!

The text was updated successfully, but these errors were encountered:

percevalw · 2025-02-12T15:19:15Z

Hi @ruiouyangVA, thanks for your interest in our library! Your approach makes sens.

While these components were originally designed with French in mind, many should work across most Latin languages, including English. Have you tried the eds.charlson matcher on your documents? Does it work out of the box? Indeed, looking at the patterns file, there’s nothing inherently language-specific about it.

For components like negation, parenthood, and hypothesis detection, they are adaptations of the NegEx and ConText algorithms. Translating the patterns should yield good results.

If you make any pattern adjustments for your langage, we’d be happy to integrate them! We don’t yet have a formal multilingual API, but we’re open to exploring solutions.

One of the problems I see is that we don't have access to the clinical reports in English, so we wouldn't be able to check the changes made to the non-French patterns. Regardless of the package you end up using, how do you plan to validate your extraction pipeline ?

ruiouyangVA · 2025-02-13T17:40:03Z

Hello @percevalw, thanks for the sanity check and encouragement! I work with a subset of pathology data specific to prostate cancer, so I can't test the eds.charlson matcher. I will try to make a quick test today to see if I can add a pattern and extract some information. I wasn't sure if the NegEx and ConText plays into the NER pipelines so I'll take a closer look also while doing that.

Thank you for the clear documentation btw, it's very handy!

We do have ground truth data labelled by clinical collaborators, which will form our validation set. If the EDS team would like English clinical reports to check any changes, I would need to ask about how that is usually handled.

Lots of work to be done and not enough people as usual :') but hopefully I can build on the EDS work and make it faster in the future to add / debug / improve our extractors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Support for English notes? #374

Feature request: Support for English notes? #374

ruiouyangVA commented Feb 11, 2025 •

edited

Loading

percevalw commented Feb 12, 2025

ruiouyangVA commented Feb 13, 2025

Feature request: Support for English notes? #374

Feature request: Support for English notes? #374

Comments

ruiouyangVA commented Feb 11, 2025 • edited Loading

percevalw commented Feb 12, 2025

ruiouyangVA commented Feb 13, 2025

ruiouyangVA commented Feb 11, 2025 •

edited

Loading