Skip to content

Commit

Permalink
further updates
Browse files Browse the repository at this point in the history
  • Loading branch information
felixdittrich92 committed Jul 11, 2023
1 parent ccf1583 commit 959bf34
Show file tree
Hide file tree
Showing 7 changed files with 162 additions and 149 deletions.
36 changes: 24 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@

[![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.6.0-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb)


**Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch**


What you can expect from this repository:

- efficient ways to parse textual information (localize and identify each word) from your documents
- guidance on how to integrate this in your current architecture

Expand Down Expand Up @@ -44,7 +43,9 @@ multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jp
```

### Putting it together

Let's use the default pretrained model for an example:

```python
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
Expand All @@ -57,6 +58,7 @@ result = model(doc)
```

### Dealing with rotated documents

Should you use docTR on documents that include rotated pages, or pages with multiple box orientations,
you have multiple options to handle it:

Expand All @@ -69,7 +71,6 @@ will be converted to straight boxes), you need to pass `export_as_straight_boxes

If both options are set to False, the predictor will always fit and return rotated boxes.


To interpret your model's predictions, you can visualize them interactively as follows:

```python
Expand All @@ -89,7 +90,6 @@ plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

![Synthesis sample](https://github.com/mindee/doctr/releases/download/v0.3.1/synthesized_sample.png)


The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):

Expand All @@ -100,6 +100,7 @@ json_output = result.export()
```

### Use the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
Expand All @@ -121,10 +122,11 @@ for class_name in predictions.keys():
for prediction in list_predictions:
print(f"Prediction for {class_name}: {prediction}")
```
The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

### If you are looking for support from the Mindee team

[![Bad OCR test detection image asking the developer if they need help](https://github.com/mindee/doctr/releases/download/v0.5.1/doctr-need-help.png)](https://mindee.com/product/doctr)

## Installation
Expand All @@ -136,6 +138,7 @@ Python 3.8 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to
Since we use [weasyprint](https://weasyprint.readthedocs.io/), you will need extra dependencies if you are not running Linux.

For MacOS users, you can install them as follows:

```shell
brew install cairo pango gdk-pixbuf libffi
```
Expand All @@ -149,6 +152,7 @@ You can then install the latest release of the package using [pypi](https://pypi
```shell
pip install python-doctr
```

> :warning: Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.
We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:
Expand All @@ -166,6 +170,7 @@ For MacBooks with M1 chip, you will need some additional packages or specific ve
- PyTorch: [version >= 1.12.0](https://pytorch.org/get-started/locally/#start-locally)

### Developer mode

Alternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
First clone the project repository:

Expand All @@ -175,22 +180,25 @@ pip install -e doctr/.
```

Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:

```shell
# for TensorFlow
pip install -e doctr/.[tf]
# for PyTorch
pip install -e doctr/.[torch]
```


## Models architectures

Credits where it's due: this repository is implementing, among others, architectures from published research papers.

### Text Detection

- DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).
- LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)

### Text Recognition

- CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).
- SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).
- MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).
Expand All @@ -203,7 +211,6 @@ Credits where it's due: this repository is implementing, among others, architect

The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications.


### Demo app

A minimal demo app is provided for you to play with our end-to-end OCR models!
Expand All @@ -220,19 +227,23 @@ Check it out [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%2
If you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required.

##### Tensorflow version

```shell
pip install -r demo/tf-requirements.txt
```

Then run your app in your default browser with:

```shell
USE_TF=1 streamlit run demo/app.py
```

##### PyTorch version

```shell
pip install -r demo/pt-requirements.txt
```

Then run your app in your default browser with:

```shell
Expand All @@ -246,7 +257,6 @@ Check out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to

![TFJS demo](https://github.com/mindee/doctr-tfjs-demo/releases/download/v0.1-models/demo_illustration_mini.png)


### Docker container

If you wish to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
Expand All @@ -262,28 +272,32 @@ An example script is provided for a simple documentation analysis of a PDF or im
```shell
python scripts/analyze.py path/to/your/doc.pdf
```
All script arguments can be checked using `python scripts/analyze.py --help`

All script arguments can be checked using `python scripts/analyze.py --help`

### Minimal API integration

Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework.

#### Deploy your API locally

Specific dependencies are required to run the API template, which you can install as follows:

```shell
cd api/
pip install poetry
make lock
pip install -r requirements.txt
```

You can now run your API locally:

```shell
uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
```

Alternatively, you can run the same server on a docker container if you prefer using:

```shell
PORT=8002 docker-compose up -d --build
```
Expand All @@ -300,8 +314,8 @@ response = requests.post("http://localhost:8002/ocr", files={'file': data}).json
```

### Example notebooks
Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview.

Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview.

## Citation

Expand All @@ -317,14 +331,12 @@ If you wish to cite this project, feel free to use this [BibTeX](http://www.bibt
}
```


## Contributing

If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?

You're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](CONTRIBUTING.md)) for you to easily do so!


## License

Distributed under the Apache 2.0 License. See [`LICENSE`](LICENSE) for more information.
14 changes: 8 additions & 6 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,19 @@ You will only need to install [Git](https://git-scm.com/book/en/v2/Getting-Start
### Starting your web server

You will need to clone the repository first, go into `api` folder and start the api:

```shell
git clone https://github.com/mindee/doctr.git
cd doctr/api
make run
```

Once completed, your [FastAPI](https://fastapi.tiangolo.com/) server should be running on port 8080.

### Documentation and swagger

FastAPI comes with many advantages including speed and OpenAPI features. For instance, once your server is running, you can access the automatically built documentation and swagger in your browser at: http://localhost:8080/docs


### Using the routes

You will find detailed instructions in the live documentation when your server is up, but here are some examples to use your available API routes:
Expand All @@ -40,12 +41,12 @@ print(requests.post("http://localhost:8080/detection", files={'file': data}).jso
```

should yield
```

```json
[{'box': [0.826171875, 0.185546875, 0.90234375, 0.201171875]},
{'box': [0.75390625, 0.185546875, 0.8173828125, 0.201171875]}]
```


#### Text recognition

Using the following image:
Expand All @@ -61,11 +62,11 @@ print(requests.post("http://localhost:8080/recognition", files={'file': data}).j
```

should yield
```

```json
{'value': 'invite'}
```


#### End-to-end OCR

Using the following image:
Expand All @@ -81,7 +82,8 @@ print(requests.post("http://localhost:8080/ocr", files={'file': data}).json())
```

should yield
```

```json
[{'box': [0.75390625, 0.185546875, 0.8173828125, 0.201171875],
'value': 'Hello'},
{'box': [0.826171875, 0.185546875, 0.90234375, 0.201171875],
Expand Down
54 changes: 27 additions & 27 deletions docs/source/using_doctr/using_model_export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,31 @@ Model optimization
This section is meant to help you perform inference with optimized versions of your model.


Half-precision
^^^^^^^^^^^^^^

Half-precision (or FP16) is a binary floating-point format that occupies 16 bits in computer memory.

.. tabs::

.. tab:: TensorFlow

.. code:: python3
import tensorflow as tf
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')
predictor = ocr_predictor(reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True)
.. tab:: PyTorch

.. code:: python3
import torch
predictor = ocr_predictor(reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True).cuda().half()
res = predictor(doc)
Export to ONNX
^^^^^^^^^^^^^^

Expand Down Expand Up @@ -52,32 +77,7 @@ It provides optimized performance and supports a wide range of hardware configur
model_path = export_model_to_onnx(model, model_name="vitstr.onnx, dummy_input=dummy_input)
Half-precision
^^^^^^^^^^^^^^

Half-precision (or FP16) is a binary floating-point format that occupies 16 bits in computer memory.

.. tabs::

.. tab:: TensorFlow

.. code:: python3
import tensorflow as tf
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')
predictor = ocr_predictor(reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True)
.. tab:: PyTorch

.. code:: python3
import torch
predictor = ocr_predictor(reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True).cuda().half()
res = predictor(doc)
Using your ONNX model inside DocTR
----------------------------------
Using your ONNX exported model in docTR
---------------------------------------

** Coming soon **
Loading

0 comments on commit 959bf34

Please sign in to comment.