Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiseg results not usable #80

Open
bertsky opened this issue Feb 1, 2021 · 0 comments
Open

tiseg results not usable #80

bertsky opened this issue Feb 1, 2021 · 0 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Feb 1, 2021

The way in which the trained pixel classifier for text-image segmentation is integrated here makes these predictions completely unusable:

  • original:
    FILE_0001_ORIGINAL
  • results:
image part text part
FILE_0001_BIN-WOLF-DESKEW-CROP-TISEGDEEPML_img FILE_0001_BIN-WOLF-DESKEW-CROP-TISEGDEEPML_txt

The reason for this is actually quite simple:

out = model.predict(I)
out = out.reshape((2048, 1600, 3)).argmax(axis=2)
text_part = np.ones(out.shape)
text_part[np.where(out==1)] = 0
image_part = np.ones(out.shape)
image_part[np.where(out==2)] = 0

Here, the predictions for text (1) and image (2) classes compete with the background (0) class. Where the argmax favours background over both, all is lost. This would be somewhat expectable and acceptable if this method was trained as a binarization method (on suitable GT and with suitable augmentation). But appearantly, it is not.

@mahmed1995 @mjenckel , am I correct in assuming you've used keras_segmentation for this, with 3 classes – 1 for text regions, 2 for image regions and 0 for background? What was the GT?

The obvious fix would be to just compare text vs image scores, and apply the result as an alpha mask on the original image. The result actually does look somewhat better.

  • image vs text as alpha mask:
    FILE_0001_BIN-WOLF-DESKEW-CROP-TISEGDEEPML2_txt_small

But does any consuming processor actually make use of the alpha channel? I highly doubt it.

Since the model was obviously trained on raw images, we have to apply it on raw images. But we can still take binarized images (from a binarization step in the workflow) to apply our resulting mask – by filling with white.

That seems like the better OCR-D interface to me. (Of course, contour-finding and annotation via coordinates would still be better than as clipped derived image.) What do you think, @kba?

Also, I think it's not a good idea to just keep the best scoring pixels independent of each other. This leaves results unecessarily noisy and flickery, especially where confidence is low already. Smoothing via morphological post-processing (e.g. by closing the argmax results with a suitable kernel) or filtering (e.g. by a Gaussian filter on the scores) etc should be applied. (Ideally, the model itself would get trained with a fc-CRF top layer, but that's out of scope here.) What's the "right way" to do this?

Considering that the above shown result is still unusable, I think we need to consider post-processing for the neural segmentation.

Lastly, talking about the legacy text-image segmentation integrated here as well, this does at least work reliably:

image part result text part result
FILE_0003_BIN-WOLF-DESKEW-CROP-TISEGMORPH_img FILE_0003_BIN-WOLF-DESKEW-CROP-TISEGMORPH_txt

However, both of these approaches seem to only look for images, not for line-art separators at all. IMHO that latter task is much more needed (considering the existing tools available in OCR-D right now).

@bertsky bertsky mentioned this issue Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant