Message Boards Message Boards

Recognize box letters

GROUPS:
image processing wolfram language machine learning know how

Can TextRecognize recognize the image of box letters below? Unfortunately TextRecognize on V11.01 cannot recognize it correctly. So I try to let TextRecognize recognize it by using some Wolfram Languages.

enter image description here enter image description here

1. One line image (Attachments: img1.jpg)

enter image description here

TextRecognize cannot recognize it.

TextRecognize[img1, Language -> "English"]

enter image description here

TextRecognize can recognize it a bit by removing borders (ImageCrop). TextRecognize misrecognizes the gap between box letters as "l".

img3 = ImageCrop[img1, {1170, 140}]

enter image description here

TextRecognize[img3, Language -> "English"]

enter image description here

So I crop each box from the image. First I use EdgeDetect and ImageLines to find boundaries.

lines = ImageLines[EdgeDetect[img1, 9], 0.1];
HighlightImage[img1, Line /@ lines]

enter image description here

I find out the coordinates of each point where lines cross each other. "buf" is the buffer to find the coordinates a little inside.

line = 1; char = 7; buf = 7;
row = (Take[lines, 2*line] // Sort);
rowpart = {#[[1]] + buf, #[[2]] - buf} & /@ 
   Partition[row[[All, 1, 2]], 2];
col = (Take[lines, {2*line + 1, Length[lines]}] // Sort);
colpart = {#[[1]] + buf, #[[2]] - buf} & /@ 
   Partition[col[[All, 1, 1]], 2];

Now I could crop each box from the image.

imglist = 
  Table[ImageTake[img1, rowpart[[i]], colpart[[j]]], {i, 1, line}, {j,
     1, char}];
ImageResize[#, 70] & /@ imglist[[1]]

enter image description here

I assemble all the box images.

imgtake = ImageAssemble[Flatten[imglist]]

enter image description here

TextRecognize can perfectly recognize it.

TextRecognize[imgtake, Language -> "English"]

enter image description here

2. Two lines image (Attachments: img2.jpg)

enter image description here

TextRecognize cannot recognize it.

TextRecognize[img2, Language -> "English"]

enter image description here

TextRecognize can almost recognize it by removing borders (ImageCrop).

img4 = ImageCrop[img2, {1165, 305}]

enter image description here

TextRecognize[img4, Language -> "English"]

enter image description here

I crop each box from the image for TextRecognize to recognize it perfectly.

lines = ImageLines[EdgeDetect[img2, 9], 0.1];
HighlightImage[img2, Line /@ lines]

enter image description here

I find out the coordinates of each point where lines cross each other.

line = 2; char = 7; buf = 7;
row = (Take[lines, 2*line] // Sort);
rowpart = {#[[1]] + buf, #[[2]] - buf} & /@ 
   Partition[row[[All, 1, 2]], 2];
col = (Take[lines, {2*line + 1, Length[lines]}] // Sort);
colpart = {#[[1]] + buf, #[[2]] - buf} & /@ 
   Partition[col[[All, 1, 1]], 2];

Now I could crop each box from the image.

imglist = 
  Table[ImageTake[img2, rowpart[[i]], colpart[[j]]], {i, 1, line}, {j,
     1, char}];
ImageResize[#, 70] & /@ Flatten[imglist]

enter image description here

I assemble all the box images. However, ImageAssemble expects images of the same height in one row.

dims = Min /@ Transpose[ImageDimensions /@ Flatten[imglist]];
imgtake = 
 ImageAssemble[
  Flatten[imglist] /. 
   x_Image :> ImageCrop[x, dims, Padding -> Automatic]]

enter image description here

TextRecognize can perfectly recognize it.

TextRecognize[imgtake, Language -> "English"]

enter image description here

Attachment

Attachment

POSTED BY: Kotaro Okazaki
Answer
5 days ago

Group Abstract Group Abstract