0
|
8598 Views
|
2 Replies
|
2 Total Likes
View groups...
Share
GROUPS:

"Reading Handwritten Numbers on Data Sheet"

Posted 9 years ago
 "I have a lot of these and the task is relatively simple. I want to read the Image into Mathematica, then extract a table of "0" and "1" values where the underlines on the data sheet are. If each underline on the sheet is a cell, then the rule is: Assign a "0" whereever the cell is blank (only has underline). Assign a "1" if there is a mark that looks like a "1" in the cell. Export the table to a data file or spreadsheet. does anyone have experience with this type of task?" Thanks
2 Replies
Sort By:
Posted 9 years ago
 Dear Sean Thank you so much. There are a few more issues to figure out but I will see what I can do. You have helped us evaluate the largest database on the planet for Alzheimer's. Greatly appreciated. Rod Shankle
Posted 9 years ago
 I'm not an expert on this area by any measure, but let's take a look at one part of your image:Given an image with some horizontal lines, how can tell which ones have a check over them? First, let's clean up the image. I like LocalAdaptiveBinarize for this. Finding the right coefficients for the function can done nicely with Manipulate Manipulate[ LocalAdaptiveBinarize[image, 10, {a, b, c}], {{a, .6}, -2, 2}, {{b, 2}, -2, 2}, {{c, 0}, -1, 1}] clean = LocalAdaptiveBinarize[image, 10, {-0.15, 2, 0.05}]  Get rid of the horizontal lines in the image with a Sobel filter. I basically stole this form the documentation. Then clean up the results a bit: checks = DeleteSmallComponents@ImageConvolve[clean, {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}}]  "Subtract" the checks from the original image to get an image with just the horizontal lines. I had to make the checks a bit bigger with dilation to make sure they were completely removed. ImageAdd was used because the previous result was a negative. lines = ImageAdd[clean, Dilation[checks, 1]]  The last image called "lines" needs to be cleaned up a bit. This is done by removing some small components from the image, dilating the results to ensure that lines are connected and then thinning them out again. Thinning[ Dilation[DeleteSmallComponents[ColorNegate@lines, 10], 5] ]  We want to get the ends of each of those lines. Here's how to get a simple bounding box for each from left to right, top to bottom: boxes = ComponentMeasurements[%, "BoundingBox"] Let's average out the vertical components of each bounding box, since they're supposed to be horizontal lines. This should give us a list of end points for each line: boxes2 = boxes[[All, 2]] /. {{h1_, v1_}, {h2_, v2_}} :> {{h1, Mean[{v1, v2}]}, {h2, Mean[{v1, v2}]}} Here's what the points look like (in red) HighlightImage[clean, Partition[Flatten@boxes2, 2]]  Add thirty or so pixels to the height of each box to get a region where you'd expect a check mark: regionBoxes = boxes2 /. {{{x_, y_}, p2_} :> {{x, y + 30}, p2}} HighlightImage[clean, Partition[Flatten@regionBoxes, 2]]  Now we take those regionBoxes out of the image with the checks in them: separatedChecks = ImageTrim[checks, #] & /@ regionBoxes  A reasonable rule of thumb is that there's a check if over 100 of the pixels are white: checkedQ[img_] := Total@Flatten@ImageData[img] > 100 (*Threshold number of points*) For your actual application, we'd either use the TextRecognize function or use the Classify function to make a digit recognizer. There are examples in the documentation on how to make a digit recognizer with the Classify functionSince the components are stored from left to right and top to bottom, we can easily visualize the results and compare with the original image: TableForm@ Partition[checkedQ /@ separatedChecks, 5] /. {False -> \[EmptySquare], True -> \[FilledSquare]}  We would probably want to build into this some checks to make sure that the image was being recognized properly. Maybe some extra code to handle such problems. For example, this wouldn't have worked so easily if the horizontal lines weren't as easy to find. Also we really aren't using the fact that the lines are in a grid so there's a lot we could possibly do for your application. If the images are all very consistent, we might not even have to go about finding where the checks/numbers should be for each image. Attachments: