Message Boards Message Boards

How can I train a logo detector?

Posted 10 years ago

My original question was posted on SE: How can I train a binary classifier to find a logo?, Vitaliy graciously answered it, but he suggested that since I need a more complete answer, I should try here on the Wolfram Community instead.

The Challenge

Part #1: Construct a logo detector that will find the logo of the Apple Inc. in a photo.

Here are a few clarifying points:

  • The output is whether or not the image contains the apple logo, with a confidence level.
  • The program should use HOG features and an Support Vector Machine classifier.
  • The classifier needs to be invariant to rotations, deformations, distortions, and translations.
  • Here are the training and test sets for apple, the zipped file contains four folders.
  • I've attached my notebook from the initial stack-exchange post to get you started.

Note: The images are taken from Twitter, Tumblr, Instagram, and Google image search, so the algorithm when running in the wild will need to have good sensitivity, since only roughly 5% of these images have logos in them.

Part #2:
Added requirements:

  • Needs to find the bounding box of the logos.
  • Needs to handle images that have multiple logos
  • Needs to classify fast (within 10 to 100 milliseconds).

Part #3:
Extend this to more logo brands (e.g. pepsi), I will assemble and post the training and test data for you, if anyone can get this far...


Questions & Notes

  1. I have read up on the literature on HOG training, but I can't figure out what the best practices are for cropping the positive examples. Do you crop tight or with margins on the “object”? Do you crop all positive image to a standard aspect ratio or to whatever rectangle fits the appearance of the object in the particular image?

  2. I took me a very long time to cull the datasets for training and testing, this python script helped, what free tools are there for this, specifically for cropping the logo out?

  3. I have tried OpenCV cascades from HOG features, how much different will these be from an linear or nonlinear SVM classifier's performance?

  4. It's entirely unclear how Mathematica handles the sliding windows across the images... this might have to be done with manual code, and if so, the literature seems to suggest that something in the range of 4x4 to 32x32 are the best.

  5. The HOG features are not explicitly computed. Wouldn't it be nice if ImageKeypoints had other (free) methods like HOG, BRISK, ORB, GIST, ...

  6. Mathematica notebooks quickly become unstable when they have too many images in them, any suggestions for getting around this besides DumpSave[]ing everything...

  7. This really could be the coolest real world example of image processing + machine learning in Mathematica, and prove to the world that it is a professional toolbox for vision, which unfortunately many academics do not accept yet!

Attachments:
POSTED BY: Mike Reynolds
12 Replies

I am reposting here my answer for convenience


This seems logical to me (works same efficiently without ConformImages but I just wanted to feature it):

dir =(*path to dir containing unzipped folders*);

ndir = FileNameJoin[{dir, "negative"}];
pdir = FileNameJoin[{dir, "positive"}];

nfiles = Import[ndir <> "/*.png"];
pfiles = Import[pdir <> "/*.png"];

negative = ConformImages[nfiles, 200];
positive = ConformImages[pfiles, 200];

$train = 100;

trainingData = <|"Apple" -> positive[[;;$train]], "None" -> negative[[;;$train]]|>;
testingData = <|"Apple" -> positive[[$train+1;;]], "None" -> negative[[$train+1;;]]|>;

c = Classify[trainingData, 
   Method -> {"SupportVectorMachine", 
     "KernelType" -> "RadialBasisFunction", 
     "MulticlassMethod" -> "OneVersusAll"}, 
   PerformanceGoal -> "Quality"];

Magnify[{#, c[#]} & /@ 
Flatten[{RandomSample[positive[[$trainSize + 1 ;;]], 10], 
  RandomSample[negative[[$trainSize + 1 ;;]], 10]}] // Transpose // Grid, 0.5]

enter image description here

cm = ClassifierMeasurements[c, testingData];

cm["Accuracy"]

0.796954

cm["ConfusionMatrixPlot"]

enter image description here

Response to Answer

Thanks Vitalyi, great start, yes 79% is not terrible! Unfortunately, this is not working for any images that have real backgrounds. For example: enter image description here

What do we need to to to make the detector more robust to the logo signal? This is the heart of the problem!

POSTED BY: Vitaliy Kaurov
Posted 10 years ago

The accuracy is almost zero for the test data I provided.

POSTED BY: Mike Reynolds

But is not your testing data (apple small buried in the background) are quite different from your training set (apple is image-sized - no background influence) ?

POSTED BY: Vitaliy Kaurov
Posted 10 years ago

Exactly, it's missing any sort of sliding window analysis.

POSTED BY: Mike Reynolds

A preprocessing possibility might be to use feature extraction and cropping to find candidate subimages that could perhaps match. Whether this would work well would depend on how likely the feature extraction is to get the right subimages, and how many such get produced.

POSTED BY: Daniel Lichtblau

From what people telling me you are asking about a research level problem. Some suggest some sort of partitioning of large images into smaller ones to get the logo but this would slow down the total time. Also orientations, deformations, distortions, and translations of the images matter. Did you search for any research papers on the subject? I am sure there are some general approaches. Have you seen THIS?

POSTED BY: Vitaliy Kaurov

you are asking about a research level problem

Here's the relevant XKCD: http://xkcd.com/1425

enter image description here

POSTED BY: Jan Poeschko

Hi Everyone,

I am not sure whether that helps but I get rather good results. I am not sure how many images you have used for the training but with my training sets, the results are ok.

withlogo = Import["~/Desktop/Apple/withapple/" <> #] & /@ Import["~/Desktop/Apple/withapple/"];
nologo = Import["~/Desktop/Apple/noapple/" <> #] & /@ Import["~/Desktop/Apple/noapple/"];

I have 313 images with logos and 256 without. I think that these numbers are still too small. This is the classifier:

c = Classify[Flatten[{ImageResize[#, {200, 200}] -> "logo" & /@ withlogo, ImageResize[#, {200, 200}] -> "no logo" & /@ nologo}], 
PerformanceGoal -> "Quality", Method -> "NeuralNetwork"];

Here are some results:

enter image description here

I must admit though that there are also quite a number of images that are false positives - if often mistakes trees for the logo. I think that this could be mended by using a larger image database. Also, if the logo is only a tiny part of the overall image that can cause problems - but reasonably small logos seem to work.

Once the classifier is trained, the classification takes only 0.02 seconds on my machine.

Cheers, M.

PS: Occasionally, I get quite abysmal results. For some datasets I get rather good results. So this is by no means "ready-to-use".

POSTED BY: Marco Thiel

I think they say "The tree doesn't fall far from the apple". Or something like that.

POSTED BY: Daniel Lichtblau

Indeed!

:-)

POSTED BY: Marco Thiel
Posted 10 years ago

This is not research level by any means, and it is really trivial to do with many other vision systems. One could train a cascade on HOG feature points in 3 hours to have a classifier running on a gpu in OpenCV...

POSTED BY: Mike Reynolds
Posted 10 years ago

Can you please post the training and test set you are using? And FYI, I provided my original training and test set.... it is hyperlinked to my question!

POSTED BY: Mike Reynolds
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract