Skip to content

Commit

Permalink
extend doc with DocumentBuilder options (mindee#1486)
Browse files Browse the repository at this point in the history
  • Loading branch information
felixdittrich92 authored Feb 28, 2024
1 parent f807e97 commit 2ffc1f5
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/source/using_doctr/using_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
from doctr.model import ocr_predictor
model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:

* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)

For example to disable the automatic grouping of lines into blocks:

.. code:: python3
from doctr.model import ocr_predictor
model = ocr_predictor(pretrained=True, resolve_blocks=False)
What should I do with the output?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -304,6 +317,14 @@ Here is a typical `Document` layout::
)]
)

To get only the text content of the `Document`, you can use the `render` method::

text_output = result.render()

For reference, here is the output for the `Document` above::

No. RECEIPT DATE

You can also export them as a nested dict, more appropriate for JSON format::

json_output = result.export()
Expand Down

0 comments on commit 2ffc1f5

Please sign in to comment.