You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, the predictions for text (1) and image (2) classes compete with the background (0) class. Where the argmax favours background over both, all is lost. This would be somewhat expectable and acceptable if this method was trained as a binarization method (on suitable GT and with suitable augmentation). But appearantly, it is not.
@mahmed1995@mjenckel , am I correct in assuming you've used keras_segmentation for this, with 3 classes – 1 for text regions, 2 for image regions and 0 for background? What was the GT?
The obvious fix would be to just compare text vs image scores, and apply the result as an alpha mask on the original image. The result actually does look somewhat better.
image vs text as alpha mask:
But does any consuming processor actually make use of the alpha channel? I highly doubt it.
Since the model was obviously trained on raw images, we have to apply it on raw images. But we can still take binarized images (from a binarization step in the workflow) to apply our resulting mask – by filling with white.
That seems like the better OCR-D interface to me. (Of course, contour-finding and annotation via coordinates would still be better than as clipped derived image.) What do you think, @kba?
Also, I think it's not a good idea to just keep the best scoring pixels independent of each other. This leaves results unecessarily noisy and flickery, especially where confidence is low already. Smoothing via morphological post-processing (e.g. by closing the argmax results with a suitable kernel) or filtering (e.g. by a Gaussian filter on the scores) etc should be applied. (Ideally, the model itself would get trained with a fc-CRF top layer, but that's out of scope here.) What's the "right way" to do this?
Considering that the above shown result is still unusable, I think we need to consider post-processing for the neural segmentation.
Lastly, talking about the legacy text-image segmentation integrated here as well, this does at least work reliably:
image part result
text part result
However, both of these approaches seem to only look for images, not for line-art separators at all. IMHO that latter task is much more needed (considering the existing tools available in OCR-D right now).
The text was updated successfully, but these errors were encountered:
The way in which the trained pixel classifier for text-image segmentation is integrated here makes these predictions completely unusable:
The reason for this is actually quite simple:
ocrd_anybaseocr/ocrd_anybaseocr/cli/ocrd_anybaseocr_tiseg.py
Lines 130 to 137 in e63f555
Here, the predictions for text (1) and image (2) classes compete with the background (0) class. Where the
argmax
favours background over both, all is lost. This would be somewhat expectable and acceptable if this method was trained as a binarization method (on suitable GT and with suitable augmentation). But appearantly, it is not.@mahmed1995 @mjenckel , am I correct in assuming you've used keras_segmentation for this, with 3 classes – 1 for text regions, 2 for image regions and 0 for background? What was the GT?
The obvious fix would be to just compare text vs image scores, and apply the result as an alpha mask on the original image. The result actually does look somewhat better.
But does any consuming processor actually make use of the alpha channel? I highly doubt it.
Since the model was obviously trained on raw images, we have to apply it on raw images. But we can still take binarized images (from a binarization step in the workflow) to apply our resulting mask – by filling with white.
That seems like the better OCR-D interface to me. (Of course, contour-finding and annotation via coordinates would still be better than as clipped derived image.) What do you think, @kba?
Also, I think it's not a good idea to just keep the best scoring pixels independent of each other. This leaves results unecessarily noisy and flickery, especially where confidence is low already. Smoothing via morphological post-processing (e.g. by closing the argmax results with a suitable kernel) or filtering (e.g. by a Gaussian filter on the scores) etc should be applied. (Ideally, the model itself would get trained with a fc-CRF top layer, but that's out of scope here.) What's the "right way" to do this?
Considering that the above shown result is still unusable, I think we need to consider post-processing for the neural segmentation.
Lastly, talking about the legacy text-image segmentation integrated here as well, this does at least work reliably:
However, both of these approaches seem to only look for images, not for line-art separators at all. IMHO that latter task is much more needed (considering the existing tools available in OCR-D right now).
The text was updated successfully, but these errors were encountered: