docs: update for 0.10.0 release

aphp · Dec 1, 2023 · 6575720 · 6575720
1 parent 899151b
commit 6575720
Show file tree

Hide file tree

Showing 6 changed files with 39 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -9,28 +9,38 @@
 EDS-NLP
 =======
 
-EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
+EDS-NLP is a collaborative NLP framework that aims primarily at extracting information from French clinical notes.
 At its core, it is a collection of components or pipes, either rule-based functions or
 deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components.
 
 EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
 
 Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
 
+## Features
+
+- [Rule-based components](https://aphp.github.io/edsnlp/latest/pipelines/) for French clinical notes
+- [Trainable components](https://aphp.github.io/edsnlp/latest/pipelines/trainable): NER, Span classification
+- Support for trained multitask models with [weights sharing](https://aphp.github.io/edsnlp/latest/concepts/torch-component/#sharing-subcomponents)
+- [Fast inference](https://aphp.github.io/edsnlp/latest/concepts/inference/), with multi-GPU support out of the box
+- Easy to use, with a spaCy-like API
+- Compatible with ruled-based spaCy pipelines
+- Support for various io formats like [BRAT](https://aphp.github.io/edsnlp/latest/data/standoff/), [JSON](https://aphp.github.io/edsnlp/latest/data/json/), [Parquet](https://aphp.github.io/edsnlp/latest/data/parquet/), [Pandas](https://aphp.github.io/edsnlp/latest/data/pandas/) or [Spark](https://aphp.github.io/edsnlp/latest/data/spark/)
+
 ## Quick start
 
 ### Installation
 
 You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```shell
-pip install edsnlp==0.10.0beta1
+pip install edsnlp
 ```
 
 or if you want to use the trainable components (using pytorch)
 
 ```shell
-pip install "edsnlp[ml]==0.10.0beta1"
+pip install "edsnlp[ml]"
 ```
 
 ### A first pipeline
@@ -63,7 +73,7 @@ doc.ents[0]._.negation
 # Out: True
 ```
 
-## Documentation
+## Documentation & Tutorials
 
 Go to the [documentation](https://aphp.github.io/edsnlp) for more information.
 

diff --git a/docs/concepts/inference.md b/docs/concepts/inference.md
@@ -30,7 +30,7 @@ nlp.to("cuda")  # same semantics as pytorch
 doc = nlp(text)
 ```
 
-To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing] description below.
+To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below.
 
 ## Inference on multiple documents {: #edsnlp.core.lazy_collection.LazyCollection }
 
@@ -49,6 +49,26 @@ A lazy collection contains :
 
 All methods (`.map`, `.map_model`, `.configure`) of the lazy collection are chainable, meaning that they return a new object (no in-place modification).
 
+For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs.
+
+```{ .python .no-check }
+import edsnlp
+
+nlp = edsnlp.load("path/to/model")
+# or you can create a model from scratch
+
+data = edsnlp.data.read_json("path/to/json_folder", converter="converter")
+data = data.map_model(nlp)
+# or equivalently : data = nlp.pipe(data)
+data = data.configure(
+    # 4 CPUs to parallelize rule-based pipes, IO and preprocessing
+    num_cpu_workers=4,
+    # 2 GPUs to accelerate deep-learning pipes
+    num_gpu_workers=2,
+)
+data.write_parquet("path/to/output_folder", write_in_worker=True)
+```
+
 ### Applying operations to a lazy collection
 
 To apply an operation to a lazy collection, you can use the `.map` method. It takes a callable as input and an optional dictionary of keyword arguments. The function will be applied to each element of the collection.

diff --git a/docs/concepts/pipeline.md b/docs/concepts/pipeline.md
@@ -58,7 +58,7 @@ arbitrarily chain static components or trained deep learning components. Static
 
 <div style="text-align: center" markdown="1">
 
-![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.svg){: style="height:150px" }
+![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.png){: style="height:150px" }
 
 </div>
 

diff --git a/docs/index.md b/docs/index.md
@@ -15,13 +15,13 @@ Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
 You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```{: data-md-color-scheme="slate" }
-pip install edsnlp==0.10.0beta1
+pip install edsnlp
 ```
 
 or if you want to use the trainable components (using pytorch)
 
 ```{: data-md-color-scheme="slate" }
-pip install "edsnlp[ml]==0.10.0beta1"
+pip install "edsnlp[ml]"
 ```
 
 ### A first pipeline

diff --git a/edsnlp/__init__.py b/edsnlp/__init__.py
@@ -11,6 +11,6 @@
 # from . import language
 import edsnlp.data  # noqa: F401
 
-__version__ = "0.10.0beta2"
+__version__ = "0.10.0"
 
 BASE_DIR = Path(__file__).parent
diff --git a/edsnlp/data/converters.py b/edsnlp/data/converters.py
@@ -2,8 +2,6 @@
 Converters are used to convert documents between python dictionaries and Doc objects.
 There are two types of converters: readers and writers. Readers convert dictionaries to
 Doc objects, and writers convert Doc objects to dictionaries.
-
-Why are these classes instead of functions?
 """
 import contextlib
 import inspect
-Original file line number
+Diff line change
@@ Expand Up @@
     <div style="text-align: center" markdown="1">
-    ![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.svg){: style="height:150px" }
+    ![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.png){: style="height:150px" }
     </div>
@@ Expand Down @@