refacto: fix paths after pipelines to pipes renaming

aphp · Dec 1, 2023 · 634b512 · 634b512
1 parent 0e233b4
commit 634b512
Show file tree

Hide file tree

Showing 139 changed files with 505 additions and 475 deletions.
diff --git a/contributing.md b/contributing.md
@@ -4,12 +4,12 @@ We welcome contributions ! There are many ways to help. For example, you can:
 
 1. Help us track bugs by filing issues
 2. Suggest and help prioritise new functionalities
-3. Develop a new pipeline ! Fork the project and propose a new functionality through a pull request
+3. Develop a new pipe ! Fork the project and propose a new functionality through a pull request
 4. Help us make the library as straightforward as possible, by simply asking questions on whatever does not seem clear to you.
 
 ## Development installation
 
-To be able to run the test suite, run the example notebooks and develop your own pipeline, you should clone the repo and install it locally.
+To be able to run the test suite, run the example notebooks and develop your own pipeline component, you should clone the repo and install it locally.
 
 <div class="termy">
 
@@ -80,15 +80,15 @@ python -m pytest
 
 Should your contribution propose a bug fix, we require the bug be thoroughly tested.
 
-### Architecture of a pipeline
+### Architecture of a pipeline component
 
-Pipelines should follow the same pattern :
+Pipes should follow the same pattern :
 
 ```
-edsnlp/pipelines/<pipeline>
-   |-- <pipeline>.py                # Defines the component logic
+edsnlp/pipes/<pipe>
+   |-- <pipe>.py                # Defines the component logic
    |-- patterns.py                  # Defines matched patterns
-   |-- factory.py                   # Declares the pipeline to spaCy
+   |-- factory.py                   # Declares the component to spaCy
 ```
 
 ### Style Guide

diff --git a/demo/app.py b/demo/app.py
@@ -48,7 +48,7 @@
 nlp.add_pipe("eds.normalizer")
 nlp.add_pipe("eds.sentences")
 {pipes}
-# Qualifier pipelines
+# Qualifier pipes
 nlp.add_pipe("eds.negation")
 nlp.add_pipe("eds.family")
 nlp.add_pipe("eds.hypothesis")
@@ -109,7 +109,6 @@ def load_model(custom_regex: str, **enabled):
     nlp.add_pipe("eds.sentences")
 
     for title, name in PIPES.items():
-
         if name == "drugs":
             if enabled["drugs"]:
                 if enabled["fuzzy_drugs"]:
@@ -128,7 +127,7 @@ def load_model(custom_regex: str, **enabled):
                 pipes.append(f'nlp.add_pipe("eds.{name}")')
 
     if pipes:
-        pipes.insert(0, "# Entity extraction pipelines")
+        pipes.insert(0, "# Entity extraction pipes")
 
     if custom_regex:
         nlp.add_pipe(
@@ -169,7 +168,7 @@ def load_model(custom_regex: str, **enabled):
     "EDS-NLP is a contributive effort maintained by AP-HP's Data Science team. "
     "Have a look at the "
     "[documentation](https://aphp.github.io/edsnlp/) for "
-    "more information on the available pipelines."
+    "more information on the available components."
 )
 
 st.sidebar.header("Pipeline")
@@ -201,8 +200,8 @@ def load_model(custom_regex: str, **enabled):
         continue
     st_pipes[name] = st.sidebar.checkbox(title, value=True)
 st.sidebar.markdown(
-    "These are just a few of the pipelines provided out-of-the-box by EDS-NLP. "
-    "See the [documentation](https://aphp.github.io/edsnlp/latest/pipelines/) "
+    "These are just a few of the components provided out-of-the-box by EDS-NLP. "
+    "See the [documentation](https://aphp.github.io/edsnlp/latest/pipes/) "
     "for detail."
 )
 

diff --git a/docs/index.md b/docs/index.md
@@ -56,7 +56,7 @@ doc.ents[0]._.negation  # (6)
 
 1. 'eds' is the name of the language, which defines the [tokenizer](/tokenizers).
 2. This example terminology provides a very simple, and by no means exhaustive, list of synonyms for COVID19.
-3. In spaCy, pipelines are added via the [`nlp.add_pipe` method](https://spacy.io/api/language#add_pipe). EDS-NLP pipelines are automatically discovered by spaCy.
+3. Similarly to spaCy, pipes are added via the [`nlp.add_pipe` method](https://spacy.io/api/language#add_pipe).
 4. See the [matching tutorial](tutorials/matching-a-terminology.md) for mode details.
 5. spaCy stores extracted entities in the [`Doc.ents` attribute](https://spacy.io/api/doc#ents).
 6. The `eds.negation` component has adds a `negation` custom attribute.
@@ -71,7 +71,7 @@ To learn more about EDS-NLP, we have prepared a series of tutorials that should
 
 ## Available pipeline components
 
---8<-- "docs/pipelines/index.md:components"
+--8<-- "docs/pipes/index.md:components"
 
 ## Disclaimer
 

diff --git a/docs/pipes/architecture.md b/docs/pipes/architecture.md
@@ -1,40 +1,40 @@
 # Basic Architecture
 
-Most pipelines provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:
+Most pipes provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:
 
 1. Implement a normaliser (see `eds.normalizer`)
 2. Add an entity recognition component (eg the simple but powerful `eds.matcher`)
 3. Add zero or more entity qualification components, such as `eds.negation`, `eds.family` or `eds.hypothesis`. These qualifiers typically help detect false-positives.
 
 ## Scope
 
-Since the basic usage of EDS-NLP components is to qualify entities, most pipelines can function in two modes:
+Since the basic usage of EDS-NLP components is to qualify entities, most pipes can function in two modes:
 
 1. Annotation of the extracted entities (this is the default). To increase throughput, only pre-extracted entities (found in `doc.ents`) are processed.
 2. Full-text, token-wise annotation. This mode is activated by setting the `on_ents_only` parameter to `False`.
 
-The possibility to do full-text annotation implies that one could use the pipelines the other way around, eg detecting all negations once and for all in an ETL phase, and reusing the results consequently. However, this is not the intended use of the library, which aims to help researchers downstream as a standalone application.
+The possibility to do full-text annotation implies that one could use the pipes the other way around, eg detecting all negations once and for all in an ETL phase, and reusing the results consequently. However, this is not the intended use of the library, which aims to help researchers downstream as a standalone application.
 
 ## Result persistence
 
-Depending on their purpose (entity extraction, qualification, etc), EDS-NLP pipelines write their results to `Doc.ents`, `Doc.spans` or in a custom attribute.
+Depending on their purpose (entity extraction, qualification, etc), EDS-NLP pipes write their results to `Doc.ents`, `Doc.spans` or in a custom attribute.
 
-### Extraction pipelines
+### Extraction pipes
 
-Extraction pipelines (matchers, the date detector or NER pipelines, for instance) keep their results to the `Doc.ents` attribute directly.
+Extraction pipes (matchers, the date detector or NER pipes, for instance) keep their results to the `Doc.ents` attribute directly.
 
 Note that spaCy prohibits overlapping entities within the `Doc.ents` attribute. To circumvent this limitation, we [filter spans][edsnlp.utils.filter.filter_spans], and keep all discarded entities within the `discarded` key of the `Doc.spans` attribute.
 
-Some pipelines write their output to the `Doc.spans` dictionary. We enforce the following doctrine:
+Some pipes write their output to the `Doc.spans` dictionary. We enforce the following doctrine:
 
 - Should the pipe extract entities that are directly informative (typically the output of the `eds.matcher` component), said entities are stashed in the `Doc.ents` attribute.
 - On the other hand, should the entity be useful to another pipe, but less so in itself (eg the output of the `eds.sections` or `eds.dates` component), it will be stashed in a specific key within the `Doc.spans` attribute.
 
 ### Entity tagging
 
-Moreover, most pipelines declare [spaCy extensions](https://spacy.io/usage/processing-pipelines#custom-components-attributes), on the `Doc`, `Span` and/or `Token` objects.
+Moreover, most pipes declare [spaCy extensions](https://spacy.io/usage/processing-pipelines#custom-components-attributes), on the `Doc`, `Span` and/or `Token` objects.
 
-These extensions are especially useful for qualifier pipelines, but can also be used by other pipelines to persist relevant information. For instance, the `eds.dates` pipeline:
+These extensions are especially useful for qualifier pipes, but can also be used by other pipes to persist relevant information. For instance, the `eds.dates` pipeline component:
 
 1. Populates `#!python Doc.spans["dates"]`
 2. For each detected item, keeps the normalised date in `#!python Span._.date`
diff --git a/docs/pipes/core/contextual-matcher.md b/docs/pipes/core/contextual-matcher.md
@@ -1,5 +1,5 @@
 
-# Contextual Matcher {: #edsnlp.pipelines.core.contextual_matcher.factory.create_component }
+# Contextual Matcher {: #edsnlp.pipes.core.contextual_matcher.factory.create_component }
 
 During feature extraction, it may be necessary to search for additional patterns in their neighborhood, namely:
 
@@ -13,7 +13,7 @@ The ContextualMatcher allows to perform this extraction in a clear and concise w
 
 ## The configuration file
 
-The whole ContextualMatcher pipeline is basically defined as a list of **pattern dictionaries**.
+The whole ContextualMatcher pipeline component is basically defined as a list of **pattern dictionaries**.
 Let us see step by step how to build such a list using the example stated just above.
 
 ### a. Finding mentions of cancer
@@ -326,10 +326,10 @@ dict(
 )
 ```
 
-::: edsnlp.pipelines.core.contextual_matcher.factory.create_component
+::: edsnlp.pipes.core.contextual_matcher.factory.create_component
     options:
         only_parameters: true
 
 ## Authors and citation
 
-The `eds.matcher` pipeline was developed by AP-HP's Data Science team.
+The `eds.matcher` pipeline component was developed by AP-HP's Data Science team.
diff --git a/docs/pipes/core/endlines.md b/docs/pipes/core/endlines.md
@@ -1,6 +1,6 @@
-# Endlines {: #edsnlp.pipelines.core.endlines.factory.create_component }
+# Endlines {: #edsnlp.pipes.core.endlines.factory.create_component }
 
-::: edsnlp.pipelines.core.endlines.factory.create_component
+::: edsnlp.pipes.core.endlines.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/core/matcher.md b/docs/pipes/core/matcher.md
@@ -1,6 +1,6 @@
-# Matcher {: #edsnlp.pipelines.core.matcher.factory.create_component }
+# Matcher {: #edsnlp.pipes.core.matcher.factory.create_component }
 
-::: edsnlp.pipelines.core.matcher.factory.create_component
+::: edsnlp.pipes.core.matcher.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/core/normalizer.md b/docs/pipes/core/normalizer.md
@@ -1,4 +1,4 @@
-# Normalisation {: #edsnlp.pipelines.core.normalizer.factory.create_component }
+# Normalisation {: #edsnlp.pipes.core.normalizer.factory.create_component }
 
 The normalisation scheme used by EDS-NLP adheres to the non-destructive doctrine. In other words,
 
@@ -10,8 +10,8 @@ is always true.
 
 To achieve this, the input text is never modified. Instead, our normalisation strategy focuses on two axes:
 
-1. Only the `NORM` and `tag_` attributes are modified by the `normalizer` pipeline ;
-2. Pipelines (eg the [`pollution`](#pollution) pipeline) can mark tokens as _excluded_ by setting the extension `Token.tag_` to `EXCLUDED` or as _space_ by setting the extension `Token.tag_` to `SPACE`.
+1. Only the `NORM` and `tag_` attributes are modified by the `normalizer` pipeline component ;
+2. Pipes (e.g., [`pollution`](#pollution)) can mark tokens as _excluded_ by setting the extension `Token.tag_` to `EXCLUDED` or as _space_ by setting the extension `Token.tag_` to `SPACE`.
    It enables downstream matchers to skip excluded tokens.
 
 The normaliser can act on the input text in five dimensions :
@@ -26,12 +26,12 @@ The normaliser can act on the input text in five dimensions :
 
     We recommend you also **add an end-of-line classifier to remove excess new line characters** (introduced by the PDF layout).
 
-    We provide a `endlines` pipeline, which requires training an unsupervised model.
+    We provide a `endlines` pipeline component, which requires training an unsupervised model.
     Refer to [the dedicated page for more information](./endlines.md).
 
 ## Usage
 
-The normalisation is handled by the single `eds.normalizer` pipeline. The following code snippet is complete, and should run as is.
+The normalisation is handled by the single `eds.normalizer` pipeline component. The following code snippet is complete, and should run as is.
 
 ```python
 import edsnlp
@@ -57,19 +57,19 @@ Moreover, every span exposes a `normalized_variant` extension getter, which comp
 
 ## Configuration
 
-The pipeline can be configured using the following parameters :
+The pipeline component can be configured using the following parameters :
 
-::: edsnlp.pipelines.core.normalizer.factory.create_component
+::: edsnlp.pipes.core.normalizer.factory.create_component
     options:
        only_parameters: true
 
-## Pipelines
+## Pipes
 
 Let's review each subcomponent.
 
 ### Lowercase
 
-The `eds.lowercase` pipeline transforms every token to lowercase. It is not configurable.
+The `eds.lowercase` pipeline component transforms every token to lowercase. It is not configurable.
 
 Consider the following example :
 
@@ -98,7 +98,7 @@ get_text(doc, attr="NORM", ignore_excluded=False)
 
 ### Accents
 
-The `eds.accents` pipeline removes accents. To avoid edge cases,
+The `eds.accents` pipeline component removes accents. To avoid edge cases,
 the component uses a specified list of accentuated characters and their unaccented representation,
 making it more predictable than using a library such as `unidecode`.
 
@@ -189,7 +189,7 @@ doc = nlp("Phrase    avec des espaces \n et un retour à la ligne")
 
 ### Pollution
 
-The pollution pipeline uses a set of regular expressions to detect pollutions (irrelevant non-medical text that hinders text processing). Corresponding tokens are marked as excluded (by setting `Token._.excluded` to `True`), enabling the use of the phrase matcher.
+The pollution pipeline component uses a set of regular expressions to detect pollutions (irrelevant non-medical text that hinders text processing). Corresponding tokens are marked as excluded (by setting `Token._.excluded` to `True`), enabling the use of the phrase matcher.
 
 Consider the following example :
 
@@ -248,7 +248,7 @@ nlp.add_pipe(
 |---------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|---------------------|
 | `information` | Footnote present in a lot of notes, providing information to the patient about the use of its data                        | "L'AP-HP collecte vos données administratives à des fins ..."                                              | `True`              |
 | `bars`        | Barcodes wrongly parsed as text                                                                                           | "...NBNbWbWbNbWbNBNbNbWbW..."                                                                              | `True`              |
-| `biology`     | Parsed biology results table. It often contains disease names that often leads to *false positives* with NER pipelines.   | "...¦UI/L ¦20 ¦ ¦ ¦20-70 Polyarthrite rhumatoïde Facteur rhumatoide ¦UI/mL ¦ ¦<10 ¦ ¦ ¦ ¦0-14..."          | `False`             |
+| `biology`     | Parsed biology results table. It often contains disease names that often leads to *false positives* with NER pipes.       | "...¦UI/L ¦20 ¦ ¦ ¦20-70 Polyarthrite rhumatoïde Facteur rhumatoide ¦UI/mL ¦ ¦<10 ¦ ¦ ¦ ¦0-14..."          | `False`             |
 | `doctors`     | List of doctor names and specialities, often found in left-side note margins. Also source of potential *false positives*. | "... Dr ABC - Diabète/Endocrino ..."                                                                       | `True`              |
 | `web`         | Webpages URL and email adresses. Also source of potential *false positives*.                                              | "... www.vascularites.fr ..."                                                                              | `True`              |
 | `coding`      | Subsection containing ICD-10 codes along with their description. Also source of potential *false positives*.              | "... (2) E112 + Oeil (2) E113 + Neuro (2) E114 Démence (2) F03 MA (2) F001+G301 DCL G22+G301 Vasc (2) ..." | `False`             |
@@ -275,4 +275,4 @@ nlp.add_pipe(
 
 ## Authors and citation
 
-The `eds.normalizer` pipeline was developed by AP-HP's Data Science team.
+The `eds.normalizer` pipeline component was developed by AP-HP's Data Science team.
diff --git a/docs/pipes/core/sentences.md b/docs/pipes/core/sentences.md
@@ -1,6 +1,6 @@
-# Sentences {: #edsnlp.pipelines.core.sentences.factory.create_component }
+# Sentences {: #edsnlp.pipes.core.sentences.factory.create_component }
 
-::: edsnlp.pipelines.core.sentences.factory.create_component
+::: edsnlp.pipes.core.sentences.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/core/terminology.md b/docs/pipes/core/terminology.md
@@ -1,6 +1,6 @@
-# Terminology {: #edsnlp.pipelines.core.terminology.factory.create_component }
+# Terminology {: #edsnlp.pipes.core.terminology.factory.create_component }
 
-::: edsnlp.pipelines.core.terminology.factory.create_component
+::: edsnlp.pipes.core.terminology.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/index.md b/docs/pipes/index.md
@@ -8,33 +8,33 @@ EDS-NLP provides easy-to-use pipeline components (aka pipes).
 
 === "Core"
 
-    See the [Core components overview](/pipelines/misc/overview/) for more information.
+    See the [Core components overview](/pipes/misc/overview/) for more information.
 
-    --8<-- "docs/pipelines/core/index.md:components"
+    --8<-- "docs/pipes/core/index.md:components"
 
 === "Qualifiers"
 
-    See the [Qualifiers overview](/pipelines/qualifiers/overview/) for more information.
+    See the [Qualifiers overview](/pipes/qualifiers/overview/) for more information.
 
-    --8<-- "docs/pipelines/qualifiers/index.md:components"
+    --8<-- "docs/pipes/qualifiers/index.md:components"
 
 === "Miscellaneous"
 
-    See the [Miscellaneous components overview](/pipelines/misc/overview/) for more information.
+    See the [Miscellaneous components overview](/pipes/misc/overview/) for more information.
 
-    --8<-- "docs/pipelines/misc/index.md:components"
+    --8<-- "docs/pipes/misc/index.md:components"
 
 === "NER"
 
-    See the [NER overview](/pipelines/ner/overview/) for more information.
+    See the [NER overview](/pipes/ner/overview/) for more information.
 
-    --8<-- "docs/pipelines/ner/index.md:components"
+    --8<-- "docs/pipes/ner/index.md:components"
 
 === "Trainable"
 
-    See the [Trainable components overview](/pipelines/trainable/overview/) for more information.
+    See the [Trainable components overview](/pipes/trainable/overview/) for more information.
 
-    --8<-- "docs/pipelines/trainable/index.md:components"
+    --8<-- "docs/pipes/trainable/index.md:components"
 
 <!-- --8<-- [end:components] -->
 

diff --git a/docs/pipes/misc/consultation-dates.md b/docs/pipes/misc/consultation-dates.md
@@ -1,6 +1,6 @@
-# Consultation dates {: #edsnlp.pipelines.misc.consultation_dates.factory.create_component }
+# Consultation dates {: #edsnlp.pipes.misc.consultation_dates.factory.create_component }
 
-::: edsnlp.pipelines.misc.consultation_dates.factory.create_component
+::: edsnlp.pipes.misc.consultation_dates.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/misc/dates.md b/docs/pipes/misc/dates.md
@@ -1,6 +1,6 @@
-# Dates {: #edsnlp.pipelines.misc.dates.factory.create_component }
+# Dates {: #edsnlp.pipes.misc.dates.factory.create_component }
 
-::: edsnlp.pipelines.misc.dates.factory.create_component
+::: edsnlp.pipes.misc.dates.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/misc/measurements.md b/docs/pipes/misc/measurements.md
@@ -1,6 +1,6 @@
-# Measurements {: #edsnlp.pipelines.misc.measurements.factory.create_component }
+# Measurements {: #edsnlp.pipes.misc.measurements.factory.create_component }
 
-::: edsnlp.pipelines.misc.measurements.factory.create_component
+::: edsnlp.pipes.misc.measurements.factory.create_component
     options:
         heading_level: 2
         show_bases: false

diff --git a/docs/pipes/misc/reason.md b/docs/pipes/misc/reason.md
@@ -1,6 +1,6 @@
-# Reasons {: #edsnlp.pipelines.misc.reason.factory.create_component }
+# Reasons {: #edsnlp.pipes.misc.reason.factory.create_component }
 
-::: edsnlp.pipelines.misc.reason.factory.create_component
+::: edsnlp.pipes.misc.reason.factory.create_component
     options:
         heading_level: 2
         show_bases: false