Skip to content

Commit

Permalink
docs: update for 0.10.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Dec 1, 2023
1 parent 899151b commit 6575720
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 11 deletions.
18 changes: 14 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,28 +9,38 @@
EDS-NLP
=======

EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
EDS-NLP is a collaborative NLP framework that aims primarily at extracting information from French clinical notes.
At its core, it is a collection of components or pipes, either rule-based functions or
deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components.

EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.

Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !

## Features

- [Rule-based components](https://aphp.github.io/edsnlp/latest/pipelines/) for French clinical notes
- [Trainable components](https://aphp.github.io/edsnlp/latest/pipelines/trainable): NER, Span classification
- Support for trained multitask models with [weights sharing](https://aphp.github.io/edsnlp/latest/concepts/torch-component/#sharing-subcomponents)
- [Fast inference](https://aphp.github.io/edsnlp/latest/concepts/inference/), with multi-GPU support out of the box
- Easy to use, with a spaCy-like API
- Compatible with ruled-based spaCy pipelines
- Support for various io formats like [BRAT](https://aphp.github.io/edsnlp/latest/data/standoff/), [JSON](https://aphp.github.io/edsnlp/latest/data/json/), [Parquet](https://aphp.github.io/edsnlp/latest/data/parquet/), [Pandas](https://aphp.github.io/edsnlp/latest/data/pandas/) or [Spark](https://aphp.github.io/edsnlp/latest/data/spark/)

## Quick start

### Installation

You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```shell
pip install edsnlp==0.10.0beta1
pip install edsnlp
```

or if you want to use the trainable components (using pytorch)

```shell
pip install "edsnlp[ml]==0.10.0beta1"
pip install "edsnlp[ml]"
```

### A first pipeline
Expand Down Expand Up @@ -63,7 +73,7 @@ doc.ents[0]._.negation
# Out: True
```

## Documentation
## Documentation & Tutorials

Go to the [documentation](https://aphp.github.io/edsnlp) for more information.

Expand Down
22 changes: 21 additions & 1 deletion docs/concepts/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ nlp.to("cuda") # same semantics as pytorch
doc = nlp(text)
```

To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing] description below.
To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below.

## Inference on multiple documents {: #edsnlp.core.lazy_collection.LazyCollection }

Expand All @@ -49,6 +49,26 @@ A lazy collection contains :

All methods (`.map`, `.map_model`, `.configure`) of the lazy collection are chainable, meaning that they return a new object (no in-place modification).

For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs.

```{ .python .no-check }
import edsnlp
nlp = edsnlp.load("path/to/model")
# or you can create a model from scratch
data = edsnlp.data.read_json("path/to/json_folder", converter="converter")
data = data.map_model(nlp)
# or equivalently : data = nlp.pipe(data)
data = data.configure(
# 4 CPUs to parallelize rule-based pipes, IO and preprocessing
num_cpu_workers=4,
# 2 GPUs to accelerate deep-learning pipes
num_gpu_workers=2,
)
data.write_parquet("path/to/output_folder", write_in_worker=True)
```

### Applying operations to a lazy collection

To apply an operation to a lazy collection, you can use the `.map` method. It takes a callable as input and an optional dictionary of keyword arguments. The function will be applied to each element of the collection.
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ arbitrarily chain static components or trained deep learning components. Static

<div style="text-align: center" markdown="1">

![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.svg){: style="height:150px" }
![Example of a hybrid pipeline](/assets/images/hybrid-pipeline-example.png){: style="height:150px" }

</div>

Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) !
You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```{: data-md-color-scheme="slate" }
pip install edsnlp==0.10.0beta1
pip install edsnlp
```

or if you want to use the trainable components (using pytorch)

```{: data-md-color-scheme="slate" }
pip install "edsnlp[ml]==0.10.0beta1"
pip install "edsnlp[ml]"
```

### A first pipeline
Expand Down
2 changes: 1 addition & 1 deletion edsnlp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@
# from . import language
import edsnlp.data # noqa: F401

__version__ = "0.10.0beta2"
__version__ = "0.10.0"

BASE_DIR = Path(__file__).parent
2 changes: 0 additions & 2 deletions edsnlp/data/converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
Converters are used to convert documents between python dictionaries and Doc objects.
There are two types of converters: readers and writers. Readers convert dictionaries to
Doc objects, and writers convert Doc objects to dictionaries.
Why are these classes instead of functions?
"""
import contextlib
import inspect
Expand Down

0 comments on commit 6575720

Please sign in to comment.