Message Boards Message Boards

[WSS18] Smart image auto-cropping




Many web services that process huge amount of data face with problem of how show image content to the end-user. For example, when you post something on Twitter and include images in the post, Twitter tries to crop them in a way that only meaningful parts of images are shown to regular user in order to make posts more informative:
How Twitter shows preview images .
The goal of my project is to implement smart auto-cropping using Mathematica language and get familiar with different ways of finding crucial information in images.


1. Saliency map

First way to find important parts of an image is to use ImageSaliencyFilter[] function that gives us saliency map of an image:

map = ImageSaliencyFilter[image] // ImageAdjust

Saliency map of watches image .
On the saliency map we can see regions of an image that have features that stand out as different. A higher saliency value is taken to be more important. In addition to saliency mapping, EdgeDetect[] and LaplacianFilter[] functions may be used to detect edges on an image. To get final cropping area we can binarize an image and find bounding regions of white parts. Since binarization does not perform well on dense images, we can not achieve significant results on detecting main objects on an image. Another disadvantage is that saliency maps tend to change when we try to resize image, therefore accuracy of cropping will be lower in Twitter case.

2. Image keypoints

Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated or is subject to distortion, you should be able to find the same keypoints in this modified image when comparing with the original image. Mathematica has built-in ImageKeypoints[] function for that (keypoints are shown as yellow dots):
enter image description here
Then we can build a bounding region and it will be our crop area. Here are functions that process an image:

aspectRatio[rect_Rectangle] := 
  Block[{sides = Differences[{rect[[1, All]], rect[[2, All]]}][[1]]}, 

centerMass[keypoints_] := 
  Round@{Total[#[[1]] & /@ keypoints]/Length@keypoints, 
    Total[#[[2]] & /@ keypoints]/Length@keypoints};

applyMargin[r_Rectangle, imgDimensions_List, percent_Integer] := 
 Block[{rect = r, avgDim = Mean@imgDimensions},
  rect[[1, All]] -= avgDim*percent/100;
  rect[[2, All]] += avgDim*percent/100;
  ] (* add margin in % to rectangle. Basically, If percent > 0, then \
rectangle expands *)

smartCrop[input_Image] := 
 Block[{img = ImageCrop[input], faces, keypoints, boundReg, cm, 
   imgDimensions, perc = 5},
  faces = FindFaces[img];
  keypoints = ImageKeypoints[img, MaxFeatures -> 10];
  boundReg = BoundingRegion[keypoints, "MinRectangle"];
  cm = centerMass[keypoints];
  imgDimensions = ImageDimensions@img;
  If[Length@faces > 0, 
    applyMargin[Last@SortBy[faces, Area], imgDimensions, perc], 
   If[aspectRatio[boundReg] <= 16/9, 
    HighlightImage[img, applyMargin[boundReg, imgDimensions, perc], 
     Rectangle[cm - Mean[imgDimensions]/4, 
      cm + Mean[imgDimensions]/4], "Darken"]]]]

3. Neural net sensitivity map

Another approach is to use classification neural networks to find main objects on an image. Using NetModel[] function to access any neural network from Wolfram Repository is the easiest way to do this. Then we build sensitivity map from image classification neural network, which shows what part of an image affects the classifier decision most.

Sensitivity map

imgIdNet = NetModel["Wolfram ImageIdentify Net V1"];
coverImageAt[image_Image, pos_List, r_: (1/6), 
   mean_: {0.45, 0.45, 0.45}] :=
   {dims = ImageDimensions[image], img , R},
   img = ImageResize[image, dims];
   R = Round[dims r];
      PadRight[mean, ImageChannels[img]], (2 R + 1) {1, 1}, 
      ColorSpace -> ImageColorSpace[img]], 
     Image@Normalize[GaussianMatrix[{R}], Max]
(* ---- *)
networkSensitivityMap[img_Image, network : (_NetChain | _NetGraph), 
   opts___Rule] :=
   {concept = network[img, opts], coverImages, w, 
    dims = ImageDimensions[img], 
    mean = NetExtract[network, {"Input", "MeanImage"}],
    step = 1/6},
   coverImages = Table[
     coverImageAt[img, {x, y}, step, mean],
     {y, 1 - step/2, step/2, -step},
     {x, step/2, 1 - step/2, step}
   w = First@Dimensions[coverImages];
    Lookup[network[Flatten[coverImages], "Probabilities", opts], 
     concept], w];
            Lookup[network[Flatten[coverImages], "Probabilities", 
              opts], concept], w]], dims], dims step/Sqrt[2]]^2)

After that we remove dark parts of an image and crop it:

smartCropSensMap[input_Image] := 
  Block[{img = ImageCrop[input], mbin, sensMap, rects, boundReg, 
    imgNet = NetModel["Wolfram ImageIdentify Net V1"]},
   sensMap = networkSensitivityMap[img, imgNet];
   mbin = MorphologicalBinarize[sensMap];
   rects = 
     ComponentMeasurements[mbin, "BoundingBox"][[All, 2]], {1}];
   HighlightImage[img, Last@SortBy[rects, Area]]];


There were 3 approaches covered for recognizing most important part of an image: saliency maps, image keypoints and neural network sensitivity maps.

Future work

Future work may consist of building a neural network from scratch using public datasets (CAT2000, MIT300) and trying to get high score.

POSTED BY: Mark Seliaev
5 days ago

Group Abstract Group Abstract