Letter recognition classifier for English alphabet.
Data source: https://archive.ics.uci.edu/ml/datasets/letter+recognition
Data description: The dataset consists of 20,000 unique examples of letters in the English alphabet, generated by randomly distorting black and white pixel maps of the 26 capital letters written in 20 different fonts. The letters are represented with 16 numerical attributes scaled to 0-15 range, which are:
- The horizontal position, counting pixels from the left edge of the image, of the center of the smallest rectangular box that can be drawn with all "on" pixels inside the box.
- The vertical position, counting pixels from the bottom, of the above box.
- The width, in pixels, of the box.
- The height, in pixels, of the box.
- The total number of "on" pixels in the character image.
- The mean horizontal position of all "on" pixels relative to the center of the box and divided by the width of the box. This feature has a negative value if the image is "leftheavy" as would be the case for the letter L.
- The mean vertical position of all "on" pixels relative to the center of the box and divided by the height of the box.
- The mean squared value of the horizontal pixel distances as measured in 6 above. This attribute will have a higher value for images whose pixels are more widely separated in the horizontal direction as would be the case for the letters W or M.
- The mean squared value of the vertical pixel distances as measured in 7 above.
- The mean product of the horizontal and vertical distances for each "on" pixel as measured in 6 and 7 above. This attribute has a positive value for diagonal lines that run from bottom left to top right and a negative value for diagonal lines from top left to bottom right.
- The mean value of the squared horizontal distance times the vertical distance for each "on" pixel. This measures the correlation of the horizontal variance with the vertical position.
- The mean value of the squared vertical distance times the horizontal distance for each "on" pixel. This measures the correlation of the vertical variance with the horizontal position.
- The mean number of edges (an "on" pixel immediately to the right of either an "off" pixel or the image boundary) encountered when making systematic scans from left to right at all vertical positions within the box. This measure distinguishes between letters like "W" or "M" and letters like 'T' or "L."
- The sum of the vertical positions of edges encountered as measured in 13 above. This feature will give a higher value if there are more edges at the top of the box, as in the letter "Y."
- The mean number of edges (an "on" pixel immediately above either an "off" pixel or the image boundary) encountered when making systematic scans of the image from bottom to top over all horizontal positions within the box.
- The sum of horizontal positions of edges encountered as measured in 15 above.