Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block segmentation: overlaps and quality of prebuilt models #82

Open
bertsky opened this issue Feb 4, 2021 · 0 comments
Open

block segmentation: overlaps and quality of prebuilt models #82

bertsky opened this issue Feb 4, 2021 · 0 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Feb 4, 2021

Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.

Here's how I gradually worked to isolate the problem.

  • using default 0.9 confidence threshold:
a b
FILE_0001_REGIONS-ANYOCR_bbox-best_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-best_pageviewer
  • using lower 0.5 confidence threshold:
a b
FILE_0001_REGIONS-ANYOCR_bbox-all_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-all_pageviewer
  • using default 0.9 confidence threshold, but annotating a polygon from the mask:
a b
FILE_0001_REGIONS-ANYOCR_mask-best_pageviewer FILE_0002_REGIONS-ANYOCR_mask-best_pageviewer
  • using lower 0.5 confidence threshold, but annotating a polygon from the mask:
a b
FILE_0001_REGIONS-ANYOCR_mask-all_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all_pageviewer
  • using lower 0.5 confidence threshold, but annotating a polygon from the mask, and doing non-maximum suppression and other post-processing (like checking for containment):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-nms_pageviewer
  • using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active_pageviewer
  • using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass), and doing non-maximum suppression and other post-processing (like checking for containment):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active-nms_pageviewer

So all these refinements seem crucial.

But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.

Hence, inevitably, we need to retrain this.

@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant