-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primitive text direction feature detector #5
base: master
Are you sure you want to change the base?
Changes from 2 commits
3635399
766fd0c
c6d4a62
9771828
f9b0974
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,377 @@ | ||
import sys | ||
import json | ||
import os.path | ||
import glob | ||
|
||
import cv2 | ||
import numpy as np | ||
|
||
import Image | ||
import ImageEnhance | ||
import ImageFilter | ||
import ImageDraw | ||
import ImageOps | ||
|
||
import base | ||
|
||
DEBUG = True | ||
|
||
# Minimum number of detected text lines. | ||
# If numer of lines recognized are below this value - the result is undefined | ||
MIN_LINES_FOR_RESULT = 3 | ||
|
||
# Magic values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Too much magic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The previous request still holds: please describe what is each of these "magic" values |
||
MAGIC_COLOR_THRESHOLD = 10 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove MAGIC_ from here and other constants. |
||
MAGIC_LINE_THRESHOLD = 10 | ||
MAGIC_SECTIONS_THRESHOLD = 64 | ||
|
||
MAGIN_GROUP_VALUE_THRESHOLD = 0.3 | ||
MAGIC_GROUP_LEN_THRESHOLD = 5 | ||
|
||
class TextFeatures(base.AbstractShredFeature): | ||
""" | ||
Tries to guess the following features of the shread: | ||
|
||
* text direction (in angles) relative to original shread | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in degrees? |
||
* number of text lines | ||
* positions of text lines (after rotation) and their heights | ||
|
||
Algorithm is pretty straightforward and slow: | ||
|
||
1. increase shread contrast | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shred |
||
2. rotate image from -45 to +45 angles and compute | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. degrees. |
||
image line histogram (sum of inverted pixels for every line) | ||
3. analyze histogram for every angle to compute resuling coefficients | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resulting |
||
3. sort computed parameters and choose the best match | ||
|
||
Currently, the best match is selected by angle, at which | ||
maximum number of text lines is found with minimum heights for each line. | ||
|
||
TODO: | ||
* better way to increase contrast of the image (based on histogram) | ||
* include and analyze additional parameters of rotated shread, like | ||
contrast of horizontal lines histogram, connect with lines detector | ||
for more accurate results, etc.. | ||
* improve performance by using OpenCV/numpy for computation | ||
|
||
""" | ||
|
||
def enhance(self, image, enhancer_class, value): | ||
enhancer = enhancer_class(image); | ||
return enhancer.enhance(value) | ||
|
||
def desaturate(self, image): | ||
""" Get green component from the PIL image | ||
""" | ||
r, g, b, a = image.split() | ||
return Image.merge("RGBA", (g,g,g,a)) | ||
|
||
def get_derivative(self, values): | ||
""" Calculate derivative of a list of values | ||
""" | ||
result = [] | ||
for i in xrange(1, len(values) - 1): | ||
result.append((values[i + 1] - values[i - 1]) / 2.0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://ru.wikipedia.org/wiki/Численное_дифференцирование There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. http://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html seems to do what you're after. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, agree |
||
|
||
return result | ||
|
||
def get_sections(self, values): | ||
""" Analyze lines histogram and return list of sections | ||
(consecutive blocks of data values > threashold) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: threshold |
||
|
||
Args: | ||
values: horizontal line histogram | ||
|
||
Returns: | ||
|
||
List of Sections. | ||
Section is an uninterrupted part of histogram | ||
dictionary with keys: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Section has a defined set of keys, I suggest replacing dictionary with collections.namedtuple. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, didn't know about them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update the comment. No dictionaries returned here. |
||
|
||
pos: start position of section | ||
len: length of section | ||
value: inverted sum of pixels for that section | ||
""" | ||
sections = [] | ||
|
||
current_section = [] | ||
is_in_section = False | ||
spacing_len = 0 | ||
position = 0 | ||
|
||
for i, value in enumerate(values): | ||
|
||
if value > MAGIC_SECTIONS_THRESHOLD: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You only include into histograms for pixel[0]<MAGIC_COLOR_THRESHOLD (10) values 255-pixel[0]. So the minimum value in histogram would be 245, where it's not 0. I suggest replacing this check with "if value != 0:" or just "if value:" to make the code easier to follow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that is pretty dirty. Planned to remove that code and add image binarization in the beginning... |
||
|
||
if not is_in_section: | ||
sections.append({'len' : -spacing_len, 'value' : 0, 'pos': i - spacing_len}) | ||
is_in_section = True | ||
spacing_len = 0 | ||
|
||
current_section.append(value) | ||
else: | ||
|
||
if is_in_section: | ||
is_in_section = False; | ||
sections.append({ 'len' : len(current_section), 'value' : sum(current_section), 'pos' : i - len(current_section)}) | ||
current_section = [] | ||
|
||
spacing_len += 1 | ||
|
||
return sections | ||
|
||
def get_histogram_for_angle(self, image, angle): | ||
""" | ||
Rotates an image for the specified angle and calculates | ||
sum of pixel values for every row | ||
|
||
Args: | ||
|
||
image: PIL image object | ||
angle: rotation angle | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rotation angle in degrees. |
||
|
||
Returns: | ||
list of values. Each value is a sum of inverted pixel's for the corresponding row | ||
""" | ||
|
||
copy = image.rotate(angle, Image.BILINEAR, True) | ||
|
||
line_histogram = [] | ||
|
||
for i in xrange(copy.size[1]): | ||
line = copy.crop( (0, i, copy.size[0], i + 1)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it any faster? I mean, you can use get_data of the original image, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep (don't know why). It also makes code a bit clearer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw, crop will not create a copy, it references the same original image There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about this interpretation of the method:
It works about 10x faster and I believe yields the same result. |
||
|
||
value = 0 | ||
|
||
for pixel in line.getdata(): | ||
if pixel[0] < MAGIC_COLOR_THRESHOLD and pixel[1] == 255: | ||
value += 255 - pixel[0] | ||
|
||
line_histogram.append(value) | ||
|
||
return line_histogram | ||
|
||
def group_section_below_threshold(self, section, group_threshold): | ||
if section['len'] > 0 and section['value'] < group_threshold: | ||
return True | ||
|
||
if section['len'] <= 0 and section['len'] > - MAGIC_GROUP_LEN_THRESHOLD: | ||
return True | ||
|
||
return False | ||
|
||
def group_sections(self, sections): | ||
""" Groups adjacent sections which are devided by only few pixels. | ||
""" | ||
finished = False | ||
|
||
section_avg_value = 0 | ||
positive_sections = [s['value'] for s in sections if s['len'] > 0] | ||
|
||
if len(positive_sections) > 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if positive_sections: |
||
section_avg_value = sum( positive_sections ) / float(len(positive_sections)) | ||
|
||
group_threshold = section_avg_value * MAGIN_GROUP_VALUE_THRESHOLD | ||
|
||
while not finished: | ||
|
||
finished = True | ||
for i in xrange(1, len(sections) - 1): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. UPD: nevermind. |
||
if self.group_section_below_threshold(sections[i], group_threshold): | ||
sections[i-1]['len'] += sections[i+1]['len'] + sections[i]['len'] | ||
sections[i-1]['value'] += sections[i+1]['value'] | ||
|
||
sections[i:i+2] = [] | ||
finished = False | ||
break | ||
|
||
if len(sections) == 0: | ||
return | ||
|
||
if self.group_section_below_threshold(sections[0], group_threshold): | ||
sections[0:1] = [] | ||
|
||
if len(sections) == 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if not sections: |
||
return | ||
|
||
if self.group_section_below_threshold(sections[-1], group_threshold): | ||
sections[-1:] = [] | ||
|
||
def log_format_sections(self, sections): | ||
""" Formats sections in human-readable form for debugging | ||
""" | ||
|
||
data = [] | ||
for section in sections: | ||
data.append("%s (%s)" % (section['len'], section['value'])) | ||
|
||
return ", ".join(data) | ||
|
||
def get_derivative_coef(self, histogram): | ||
""" Calculates the square sum of derivative from histogram | ||
This can be used to measure "sharpness" of the histogram | ||
""" | ||
derivative = self.get_derivative(histogram) | ||
return sum(map(lambda x: x*x, derivative)) | ||
|
||
def get_rotation_info(self, image, angle): | ||
""" | ||
Rotates image and compute resulting coefficients for the specified angle | ||
|
||
Args: | ||
image: grayscale python image | ||
angle: angle for which to rotate an image | ||
|
||
Returns: | ||
|
||
dictionary with values for the specified angle | ||
|
||
Coefficients currently computed: | ||
nsc (Normalized Sections Count) - number of text lines, | ||
without those lines, which have very little pixels in them | ||
|
||
heights - sum of heights of lines | ||
|
||
Additional lists returned (currently used only for debug and experiments): | ||
derivative_pos: list of positive derivatives values for histogram | ||
derivative_neg: list of negative derivatives values for histogram | ||
full_sections: list of sections with enough data for analysis | ||
sections: list of all sections | ||
""" | ||
|
||
diagram = self.get_histogram_for_angle(image, angle) | ||
sections = self.get_sections(diagram) | ||
|
||
self.group_sections(sections) | ||
|
||
|
||
# Remove all spacing sections | ||
sections = [s for s in sections if s['len'] > 0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if s['len'] probably |
||
#positive_sections = [s for s in sections if s['len'] > 0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't need it — delete it. |
||
|
||
full_sections = [] | ||
normalized_sections_count = 0 | ||
sections_heights = 0 | ||
|
||
if len(sections) > 0: | ||
# get average section size | ||
section_avg_value = sum( [s['value'] for s in sections] ) / float(len(sections)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. microoptimisation: you can multiple to 0.3 right here and avoid doing that in loop |
||
|
||
full_sections = [s for s in sections if s['value'] > 0.3 * section_avg_value] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another magic constant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kind of adaptive filtering, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, it seems that this code is not useful now. Checking this right now |
||
normalized_sections_count = len(full_sections) | ||
|
||
sections_heights = sum( map(lambda x: x['len'], full_sections) ) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
return {'angle' : angle, | ||
'nsc': normalized_sections_count, | ||
'heights': sections_heights, | ||
'derivative': self.get_derivative_coef(diagram), | ||
'full_sections': full_sections} | ||
|
||
def sort_result(self, result): | ||
""" Sort result by important parameters | ||
|
||
Args: | ||
result: list of dictionaries for each tested angle | ||
|
||
Returns: | ||
sorted dict with the most accurate result first | ||
""" | ||
|
||
def sort_fun2(a, b): | ||
if b['nsc'] == a['nsc']: | ||
return a['heights'] - b['heights'] | ||
|
||
return b['nsc'] - a['nsc'] | ||
|
||
def sort_fun(a, b): | ||
if b['nsc'] == a['nsc']: | ||
return b['derivative'] - a['derivative'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. return cmp(b['derivative'], a['derivative']) |
||
|
||
return b['nsc'] - a['nsc'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. return cmp... |
||
|
||
result.sort( sort_fun2 ) | ||
|
||
def info_for_angles(self, image): | ||
"""Args: | ||
image: grayscale python image | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not just grayscale: grayscale with alpha channel |
||
|
||
Returns: | ||
list of dicts with info for every angle tested | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. list of RotationInfo instances |
||
""" | ||
|
||
result = [] | ||
for angle in xrange(-45, 45): | ||
|
||
if DEBUG: sys.stdout.write(".") | ||
|
||
rotation_info = self.get_rotation_info(image, angle) | ||
result.append(rotation_info) # diagram, derivative | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment doesn't add any useful information |
||
|
||
if DEBUG: sys.stdout.write("\n") | ||
|
||
self.sort_result(result) | ||
|
||
return result | ||
|
||
|
||
def get_info(self, shred, contour, name): | ||
|
||
if DEBUG: | ||
print "Processing file: %s" % (name) | ||
|
||
image = Image.fromarray(cv2.cvtColor(shred, cv2.COLOR_BGRA2RGBA)) | ||
image = image.convert("LA") | ||
|
||
image = self.enhance(image, ImageEnhance.Brightness, 1.5); | ||
image = self.enhance(image, ImageEnhance.Contrast, 3); | ||
|
||
results = self.info_for_angles(image) | ||
|
||
top_result = results[0] | ||
resulting_angle = top_result['angle'] | ||
|
||
if DEBUG: | ||
result = image.rotate(resulting_angle, Image.BILINEAR, True) | ||
result.save("results/%s" % (name)) | ||
|
||
if top_result['nsc'] >= MIN_LINES_FOR_RESULT: | ||
return {'text_angle' : resulting_angle, 'text_sections' : [{'pos' : s['pos'], 'length' : s['len']} for s in top_result['full_sections']]} | ||
else: | ||
return {'text_angle' : "undefined" } | ||
|
||
if __name__ == '__main__': | ||
|
||
def process_shred(full_name): | ||
|
||
features = TextFeatures(None) | ||
cv_image = cv2.imread(full_name, -1) | ||
|
||
file_name = os.path.split(full_name)[1] | ||
|
||
result = features.get_info(cv_image, None, file_name) | ||
|
||
if result == None: | ||
return | ||
|
||
with open("results/%s.json" %(file_name), "wt") as f_info: | ||
f_info.write( json.dumps(result, sort_keys=True, | ||
indent=4, separators=(',', ': ')) ) | ||
|
||
if len(sys.argv) < 2: | ||
print "Error: Please specify path or file" | ||
sys.exit(255) | ||
|
||
path = sys.argv[1] | ||
|
||
if os.path.isfile(path): | ||
process_shred(path) | ||
else: | ||
|
||
for full_name in glob.glob("%s\\*.png" % (path)): | ||
|
||
if full_name.count("_ctx") > 0 or full_name.count("_mask") > 0: | ||
continue | ||
|
||
process_shred(full_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo