-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
200beb0
commit 37c7f6e
Showing
18 changed files
with
672 additions
and
93 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
docs/_posts/aymanechilah/2023-07-11-dit_base_finetuned_rvlcdip_en_3_2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
layout: model | ||
title: DiT model pretrained on IIT-CDIP and finetuned on RVL-CDIP for document classification | ||
author: John Snow Labs | ||
name: dit_base_finetuned_rvlcdip | ||
date: 2023-07-11 | ||
tags: [en, licensed] | ||
task: OCR Document Classification | ||
language: en | ||
nav_key: models | ||
edition: Visual NLP 4.0.0 | ||
spark_version: 3.2.1 | ||
supported: true | ||
annotator: VisualDocumentClassifierv3 | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images. This model was trained for document image classification in the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes). | ||
|
||
The abstract from the paper is the following: Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose DiT, a self-supervised pre-trained Document Image Transformer model using large-scale unlabeled text images for Document AI tasks, which is essential since no supervised counterparts ever exist due to the lack of human labeled document images. We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, as well as table detection. Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results on these downstream tasks, e.g. document image classification (91.11 → 92.69), document layout analysis (91.0 → 94.9) and table detection (94.23 → 96.55). | ||
|
||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/ocr/IMAGE_CLASSIFIER/){:.button.button-orange.button-orange-trans.co.button-icon} | ||
[Open in Colab](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/tutorials/Certification_Trainings/5.2.Visual_Document_Classifier_v3.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/ocr/dit_base_finetuned_rvlcdip_en_3.3.0_3.0_1654798502586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
|
||
## How to use | ||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
binary_to_image = BinaryToImage()\ | ||
.setInputCol("content") \ | ||
.setOutputCol("image") \ | ||
.setImageType(ImageType.TYPE_3BYTE_BGR) | ||
|
||
doc_class = VisualDocumentClassifierV3() \ | ||
.pretrained("dit_base_finetuned_rvlcdip", "en", "clinical/ocr") \ | ||
.setInputCols(["image"]) \ | ||
.setOutputCol("label") | ||
|
||
# OCR pipeline | ||
pipeline = PipelineModel(stages=[ | ||
binary_to_image, | ||
doc_class | ||
]) | ||
|
||
test_image_path = pkg_resources.resource_filename('sparkocr', 'resources/ocr/visualdoc/00556614_00556648.tif') | ||
bin_df = spark.read.format("binaryFile").load(test_image_path).limit(50) | ||
|
||
results = pipeline.transform(bin_df).cache() | ||
``` | ||
```scala | ||
val binary_to_image = new BinaryToImage() | ||
.setInputCol("content") | ||
.setOutputCol("image") | ||
.setImageType(ImageType.TYPE_3BYTE_BGR) | ||
|
||
val doc_class = VisualDocumentClassifierV3() | ||
.pretrained("dit_base_finetuned_rvlcdip", "en", "clinical/ocr") | ||
.setInputCols(Array("image")) | ||
.setOutputCol("label") | ||
|
||
# OCR pipeline | ||
val pipeline = new PipelineModel().setStages(Array( | ||
binary_to_image, | ||
doc_class)) | ||
|
||
val test_image_path = pkg_resources.resource_filename("sparkocr", "resources/ocr/visualdoc/00556614_00556648.tif") | ||
val bin_df = spark.read.format("binaryFile").load(test_image_path).limit(50) | ||
|
||
val results = pipeline.transform(bin_df).cache() | ||
``` | ||
</div> | ||
|
||
## Example | ||
|
||
### Input: | ||
![Screenshot](/assets/images/examples_ocr/image1.png) | ||
|
||
## Output text | ||
```bash | ||
+-------+ | ||
|label | | ||
+-------+ | ||
|invoice| | ||
+-------+ | ||
``` | ||
|
||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|dit_base_finetuned_rvlcdip| | ||
|Type:|ocr| | ||
|Compatibility:|Visual NLP 4.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Language:|en| | ||
|Size:|319.6 MB| | ||
|
||
## References | ||
|
||
IIT-CDIP, RVL-CDIP |
115 changes: 115 additions & 0 deletions
115
docs/_posts/aymanechilah/2023-07-11-tabform_v1_en_3_2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
layout: model | ||
title: Table and Form Detection | ||
author: John Snow Labs | ||
name: tabform_v1 | ||
date: 2023-07-11 | ||
tags: [en, licensed] | ||
task: Table and Form Detection | ||
language: en | ||
nav_key: models | ||
edition: Visual NLP 4.3.0 | ||
spark_version: 3.2.1 | ||
supported: true | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
Model for table and form detection in documents. It is based on text detection model with extra post-processing. | ||
|
||
## Predicted Entities | ||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/ocr/IMAGE_TABLE_FORM_DETECTION/){:.button.button-orange.button-orange-trans.co.button-icon} | ||
[Open in Colab](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/Cards/SparkOcrImageTableAndFormDetection.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/ocr/tabform_v1_en_4.2.5_3.2_1677478327651.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} | ||
|
||
|
||
## How to use | ||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
binary_to_image = BinaryToImage() \ | ||
.setImageType(ImageType.TYPE_3BYTE_BGR) | ||
|
||
region_detector = ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr") \ | ||
.setInputCol("image") \ | ||
.setOutputCol("regions") \ | ||
.setScoreThreshold(0.25) | ||
|
||
draw_regions = ImageDrawRegions() \ | ||
.setInputCol("image") \ | ||
.setInputRegionsCol("regions") \ | ||
.setOutputCol("image_with_regions") \ | ||
.setRectColor(Color.red) | ||
|
||
pipeline = PipelineModel(stages=[ | ||
binary_to_image, | ||
region_detector, | ||
draw_regions | ||
]) | ||
|
||
imagePath = "data/tabform_images/irs_sp_1.jpg" | ||
image_df = spark.read.format("binaryFile").load(imagePath) | ||
|
||
result = pipeline.transform(image_df) | ||
``` | ||
```scala | ||
val binary_to_image = new BinaryToImage() | ||
.setImageType(ImageType.TYPE_3BYTE_BGR) | ||
|
||
val region_detector = new ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr") | ||
.setInputCol("image") | ||
.setOutputCol("regions") | ||
.setScoreThreshold(0.25) | ||
|
||
val draw_regions = new ImageDrawRegions() | ||
.setInputCol("image") | ||
.setInputRegionsCol("regions") | ||
.setOutputCol("image_with_regions") | ||
.setRectColor(Color.red) | ||
|
||
val pipeline = new PipelineModel().setStages(Array( | ||
binary_to_image, | ||
region_detector, | ||
draw_regions)) | ||
|
||
val imagePath = "data/tabform_images/irs_sp_1.jpg" | ||
val image_df = spark.read.format("binaryFile").load(imagePath) | ||
|
||
val result = pipeline.transform(image_df) | ||
``` | ||
</div> | ||
|
||
## Example | ||
|
||
{%- capture input_image -%} | ||
![Screenshot](/assets/images/examples_ocr/tabform_input.png) | ||
{%- endcapture -%} | ||
|
||
{%- capture output_image -%} | ||
![Screenshot](/assets/images/examples_ocr/tabform_output.png) | ||
{%- endcapture -%} | ||
|
||
|
||
{% include templates/input_output_image.md | ||
input_image=input_image | ||
output_image=output_image | ||
%} | ||
|
||
## Model Information | ||
|
||
{:.table-form-model} | ||
|---|---| | ||
|Model Name:|tabform_v1| | ||
|Type:|ocr| | ||
|Compatibility:|Visual NLP 4.3.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Language:|en| | ||
|
Oops, something went wrong.