docs: update for 0.10.0 release

aphp · Dec 1, 2023 · 3c2e910 · 3c2e910
1 parent e3fc882
commit 3c2e910
Show file tree

Hide file tree

Showing 8 changed files with 52 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -9,28 +9,38 @@
 EDS-NLP
 =======
 
-EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
+EDS-NLP is a collaborative NLP framework that aims primarily at extracting information from French clinical notes.
 At its core, it is a collection of components or pipes, either rule-based functions or
 deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components.
 
-EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
+EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's components, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
 
 Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
 
+## Features
+
+- [Rule-based components](https://aphp.github.io/edsnlp/latest/pipes/) for French clinical notes
+- [Trainable components](https://aphp.github.io/edsnlp/latest/pipes/trainable): NER, Span classification
+- Support for trained multitask models with [weights sharing](https://aphp.github.io/edsnlp/latest/concepts/torch-component/#sharing-subcomponents)
+- [Fast inference](https://aphp.github.io/edsnlp/latest/concepts/inference/), with multi-GPU support out of the box
+- Easy to use, with a spaCy-like API
+- Compatible with ruled-based spaCy pipelines
+- Support for various io formats like [BRAT](https://aphp.github.io/edsnlp/latest/data/standoff/), [JSON](https://aphp.github.io/edsnlp/latest/data/json/), [Parquet](https://aphp.github.io/edsnlp/latest/data/parquet/), [Pandas](https://aphp.github.io/edsnlp/latest/data/pandas/) or [Spark](https://aphp.github.io/edsnlp/latest/data/spark/)
+
 ## Quick start
 
 ### Installation
 
 You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```shell
-pip install edsnlp==0.10.0beta1
+pip install edsnlp
 ```
 
 or if you want to use the trainable components (using pytorch)
 
 ```shell
-pip install "edsnlp[ml]==0.10.0beta1"
+pip install "edsnlp[ml]"
 ```
 
 ### A first pipeline
@@ -63,7 +73,7 @@ doc.ents[0]._.negation
 # Out: True
 ```
 
-## Documentation
+## Documentation & Tutorials
 
 Go to the [documentation](https://aphp.github.io/edsnlp) for more information.
 

diff --git a/changelog.md b/changelog.md
@@ -1,18 +1,19 @@
 # Changelog
 
-## v0.10.0beta2
+## v0.10.0
 
 ### Added
 
 - New add unified `edsnlp.data` api (json, brat, spark, pandas) and LazyCollection object
   to efficiently read / write data from / to different formats & sources.
-- New unified processing API to select the execution execution backends via `docs.configure(...)`
+- New unified processing API to select the execution execution backends via `data.set_processing(...)`
 - The training scripts can now use data from multiple concatenated adapters
 - Support quantized transformers (compatible with multiprocessing as well !)
 
 ### Changed
 
-- Pipes (in edsnlp/pipelines) are now lazily loaded, which should improve the loading time of the library.
+- `edsnlp.pipelines` has been renamed to `edsnlp.pipes`, but the old name is still available for backward compatibility
+- Pipes (in `edsnlp/pipes`) are now lazily loaded, which should improve the loading time of the library.
 - `to_disk` methods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)
 - The `eds.tokenizer` tokenizer has been added to entry points, making it accessible from the outside
 - Deprecate old connectors (e.g. BratDataConnector) in favor of the new `edsnlp.data` API

diff --git a/docs/concepts/inference.md b/docs/concepts/inference.md
@@ -30,7 +30,7 @@ nlp.to("cuda")  # same semantics as pytorch
 doc = nlp(text)
 ```
 
-To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing] description below.
+To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below.
 
 ## Inference on multiple documents {: #edsnlp.core.lazy_collection.LazyCollection }
 
@@ -49,6 +49,33 @@ A lazy collection contains :
 
 All methods (`.map`, `.map_pipeline`, `.set_processing`) of the lazy collection are chainable, meaning that they return a new object (no in-place modification).
 
+For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs.
+
+```{ .python .no-check }
+import edsnlp
+
+# Load or create a model
+nlp = edsnlp.load("path/to/model")
+
+# Read some data (this is lazy, no data will be read until the end of of this snippet)
+data = edsnlp.data.read_json("path/to/json_folder", converter="...")
+
+# Apply each pipe of the model to our documents
+data = data.map_pipeline(nlp)
+# or equivalently : data = nlp.pipe(data)
+
+# Configure the execution
+data = data.set_processing(
+    # 4 CPUs to parallelize rule-based pipes, IO and preprocessing
+    num_cpu_workers=4,
+    # 2 GPUs to accelerate deep-learning pipes
+    num_gpu_workers=2,
+)
+
+# Write the result, this will execute the lazy collection
+data.write_parquet("path/to/output_folder", converter="...", write_in_worker=True)
+```
+
 ### Applying operations to a lazy collection
 
 To apply an operation to a lazy collection, you can use the `.map` method. It takes a callable as input and an optional dictionary of keyword arguments. The function will be applied to each element of the collection.

diff --git a/docs/concepts/pipeline.md b/docs/concepts/pipeline.md
@@ -58,7 +58,7 @@ arbitrarily chain static components or trained deep learning components. Static
 
 <div style="text-align: center" markdown="1">
 
-![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.svg){: style="height:150px" }
+![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.png){: style="height:150px" }
 
 </div>
 

diff --git a/docs/index.md b/docs/index.md
@@ -15,13 +15,13 @@ Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
 You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```{: data-md-color-scheme="slate" }
-pip install edsnlp==0.10.0beta1
+pip install edsnlp
 ```
 
 or if you want to use the trainable components (using pytorch)
 
 ```{: data-md-color-scheme="slate" }
-pip install "edsnlp[ml]==0.10.0beta1"
+pip install "edsnlp[ml]"
 ```
 
 ### A first pipeline

diff --git a/edsnlp/__init__.py b/edsnlp/__init__.py
@@ -14,7 +14,7 @@
 import edsnlp.data  # noqa: F401
 import edsnlp.pipes
 
-__version__ = "0.10.0beta2"
+__version__ = "0.10.0"
 
 BASE_DIR = Path(__file__).parent
 

diff --git a/edsnlp/data/converters.py b/edsnlp/data/converters.py
@@ -2,8 +2,6 @@
 Converters are used to convert documents between python dictionaries and Doc objects.
 There are two types of converters: readers and writers. Readers convert dictionaries to
 Doc objects, and writers convert Doc objects to dictionaries.
-
-Why are these classes instead of functions?
 """
 import contextlib
 import inspect

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "edsnlp"
-description = "A set of spaCy components to extract information from clinical notes written in French"
+description = "Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes."
 authors = [
     { name = "Data Science - DSN APHP", email = "[email protected]" }
 ]
-Original file line number
+Diff line change
@@ Expand Up @@
     <div style="text-align: center" markdown="1">
-    ![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.svg){: style="height:150px" }
+    ![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.png){: style="height:150px" }
     </div>
@@ Expand Down @@